GOplot 可视化基因富集分析结果

GOplot 可视化基因富集分析结果-图片1

GOplot 包通过封装好的函数可视化基因功能分析结果


#1. 安装

  1. > install.packages('GOplot')

#2. GOplot 内置数据

##2.1 脑和心脏内皮细胞的转录组数据

  • Form paper: Nolan et al. 2013,GEO accession:  GSE47067.

NameDescriptionDimension
EC$esetData frame of normalized expression values of brain and heart endothelial cells (3 replicates)20644 x 7
EC$genelistData frame of differentially expressed genes (adjusted p-value < 0.05)2039 x 7
EC$davidData frame of results from a functional analysis of the differentially expressed genes performed with DAVID174 x 5
EC$genesData frame of selected genes with logFC37 x 2
EC$processCharacter vector of selected enriched biological processes7

##2.2 查看内置数据格式

  • 导入数据

  1. > library(GOplot)
  2. > data(EC)

基因富集结果查看

  1. > head(EC$david,2)
  2. Category         ID                   Term
  3. 1       BP GO:0007507       heart development
  4. 2       BP GO:0001944 vasculature development
  5.                                                                                                                                                                                                                                                                                                                                                                             Genes
  6. 1       DLC1, NRP2, NRP1, EDN1, PDLIM3, GJA1, TTN, GJA5, ZIC3, TGFB2, CERKL, GATA6, COL4A3BP, GAB1, SEMA3C, MKL2, SLC22A5, MB, PTPRJ, RXRA, VANGL2, MYH6, TNNT2, HHEX, MURC, MIB1, FOXC2, FOXC1, ADAM19, MYL2, TCAP, EGLN1, SOX9, ITGB1, CHD7, HEXIM1, PKD2, NFATC4, PCSK5, ACTC1, TGFBR2, NF1, HSPG2, SMAD3, TBX1, TNNI3, CSRP3, FOXP1, KCNJ8, PLN, TSC2, ATP6V0A1, TGFBR3, HDAC9
  7. 2 GNA13, ACVRL1, NRP1, PGF, IL18, LEPR, EDN1, GJA1, FOXO1, GJA5, TGFB2, WARS, CERKL, APOE, CXCR4, ANG, SEMA3C, NOS2, MKL2, FGF2, RAPGEF1, PTPRJ, RECK, EFNB2, VASH1, PNPLA6, THY1, MIB1, NUS1, FOXC2, FOXC1, CAV1, CDH2, MEIS1, WT1, CDH5, PTK2, FBXW8, CHD7, PLCD1, PLXND1, FIGF, PPAP2B, MAP2K1, TBX4, TGFBR2, NF1, TBX1, TNNI3, LAMA4, MEOX2, ECSCR, HBEGF, AMOT, TGFBR3, HDAC7
  8. adj_pval
  9. 1 2.17e-06
  10. 2 1.04e-05
  • 查看选择的基因

  1. > head(EC$genelist)
  1. ##       ID   logFC   AveExpr       t P.Value adj.P.Val       B
  2. ## 1 Slco1a4 6.645388 1.2168670 88.65515 1.32e-18 2.73e-14 29.02715
  3. ## 2 Slc19a3 6.281525 1.1600468 69.95094 2.41e-17 2.49e-13 27.62917
  4. ## 3     Ddc 4.483338 0.8365231 65.57836 5.31e-17 3.65e-13 27.18476
  5. ## 4 Slco1c1 6.469384 1.3558865 59.87613 1.62e-16 8.34e-13 26.51242
  6. ## 5 Sema3c 5.515630 2.3252117 58.53141 2.14e-16 8.81e-13 26.33626
  7. ## 6 Slc38a3 4.761755 0.9218670 54.11559 5.58e-16 1.76e-12 25.70308
  • 构建画图数据:circle_dat()

  1. > circ <- circle_dat(EC$david, EC$genelist)
  2. > head(circ)
  1. category         ID             term count genes     logFC adj_pval     zscore
  2. BP GO:0007507 heart development   54   DLC1 -0.9707875 2.17e-06 -0.8164966
  3. BP GO:0007507 heart development   54   NRP2 -1.5153173 2.17e-06 -0.8164966
  4. BP GO:0007507 heart development   54   NRP1 -1.1412315 2.17e-06 -0.8164966
  5. BP GO:0007507 heart development   54   EDN1 1.3813006 2.17e-06 -0.8164966
  6. BP GO:0007507 heart development   54 PDLIM3 -0.8876939 2.17e-06 -0.8164966
  7. BP GO:0007507 heart development   54   GJA1 -0.8179480 2.17e-06 -0.8164966
  • zscore: 每个GO term下上调(logFC>0)基因数和下调基因数的差与注释到GO term基因数平方根的商。

GOplot 可视化基因富集分析结果-图片2

#3. 画图

##3.1 条形图(GOBar())

  • 画BP下的GO term

  1. > GOBar(subset(circ, category == 'BP')

  • 分面同时展示BP, CC, MF的GO term

  1. > GOBar(circ, display = 'multiple')

#3.2 气泡图(GOBubble())

  1. > GOBubble(circ, labels = 3)

上图中:X轴是z-score; Y轴是多重矫正后p值的负对数;圈大小展示GO Term下基因数。

  • 分面同时展示BP, CC, MF的气泡图

  1. > GOBubble(circ, title = 'Bubble plot', colour = c('orange', 'darkred', 'gold'), display = 'multiple', labels = 3)

#2.3 圈图展示基因富集分析结果(GOCircle())

  1. > GOCircle(circ)

默认展示circ 数据前10个GO Term,通过参数nsub调整需要展示的GO Term

  • 根据GO Term选择要展示的GO Term

  1. > GOCircle(circ, nsub = c('GO:0007507', 'GO:0001568', 'GO:0001944', 'GO:0048729', 'GO:0048514', 'GO:0005886', 'GO:0008092', 'GO:0008047'))
  • 选择要展示的GO Term数量

  1. GOCircle(circ, nsub = 10)

#2.4 展示基因与GO Terms关系的圈图 (GOChord())

chord_dat ()将作图数据构建成GOChord() 要求的输入格式;一个二进制的关系矩阵,1表示基因属于该GO Term,0与之相反。

  • 选择感兴趣的基因

  1. > head(EC$genes)
  2. ## ID logFC
  3. ## 1 PTK2 -0.6527904
  4. ## 2 GNA13 0.3711599
  5. ## 3 LEPR 2.6539788
  6. ## 4 APOE 0.8698346
  7. ## 5 CXCR4 -2.5647537
  8. ## 6 RECK 3.6926860
  • 选择感兴趣的GO Term

  1. > EC$process
  2. ## [1] "heart development" "phosphorylation"
  3. ## [3] "vasculature development" "blood vessel development"
  4. ## [5] "tissue morphogenesis" "cell adhesion"
  5. ## [7] "plasma membrane"
  • 构建画图数据

  1. #chord_dat(data, genes, process)
  2. #genes、process其中任何一个参数不指定,默认使用对应的全部数据
  3. > chord <- chord_dat(circ, EC$genes, EC$process)
  4. > head(chord)
  5. ## heart development phosphorylation vasculature development
  6. ## PTK2 0 1 1
  7. ## GNA13 0 0 1
  8. ## LEPR 0 0 1
  9. ## APOE 0 0 1
  10. ## CXCR4 0 0 1
  11. ## RECK 0 0 1
  12. ## blood vessel development tissue morphogenesis cell adhesion
  13. ## PTK2 1 0 0
  14. ## GNA13 1 0 0
  15. ## LEPR 1 0 0
  16. ## APOE 1 0 0
  17. ## CXCR4 1 0 0
  18. ## RECK 1 0 0
  19. ## plasma membrane logFC
  20. ## PTK2 1 -0.6527904
  21. ## GNA13 1 0.3711599
  22. ## LEPR 1 2.6539788
  23. ## APOE 1 0.8698346
  24. ## CXCR4 1 -2.5647537
  25. ## RECK 1 3.6926860
  • 画图

  1. chord <- chord_dat(data = circ, genes = EC$genes, process = EC$process)

GOChord(chord, space = 0.02, gene.order = 'logFC', gene.space = 0.25, gene.size = 5)

GOplot 可视化基因富集分析结果-图片3

  • GOChord() 参数

  1. GOChord(data, title, space, gene.order, gene.size, gene.space, nlfc = 1,
  2. lfc.col, lfc.min, lfc.max, ribbon.col, border.size, process.label, limit)
  3. #data: 二进制矩阵
  4. #title:标题
  5. #space:基因对应方块之间的距离
  6. #gene.order:基因排列顺序
  7. #gene.size:基因标签大小
  8. #nlfc:logFC 列的数目
  9. #lfc.col:LFC颜色,定义模式:c(color for low values, color for the mid point, color for the high values)
  10. #lfc.min:LFC最小值
  11. #lfc.max:LFC最大值
  12. #ribbon.col:向量定义基因与GO Term间条带颜色
  13. #border.size:基因与GO Term间条带边框粗细
  14. #process.label:GO Term 图例文字大小
  15. #limit:c(3, 2),两个数字;第一个参数筛选基因(保留至少存在于3个GO Term的基因),第二个参数筛选GO Term(保留至少包含2个基因的GO Term )

#3.5 基因与GO Term的热图(GOHeat)

nlfc = 1:颜色对应logFC nlfc = 0:颜色对应每个基因注释了到了几个GO Term

  1. > GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green'))

  • 聚类(GOCluster)

  1. > GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2)
  • GOCluster()调用R内置函数hclust 对基因表水平达或根据功能分内进行层次聚类。

  1. > GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2)

发表评论

匿名网友

拖动滑块以完成验证
加载失败