Gene set analysis

Last updated on 2023-03-14 | Edit this page

Estimated time NA minutes

Overview

Questions

  • How do we find differentially expressed pathways?

Objectives

  • Explain how to find differentially expressed pathways with gene set analysis in R.
  • Understand how differentially expressed genes can enrich a gene set.
  • Explain how to perform a gene set analysis in R, using clusterProfiler.

Contribute!

This episode is intended to introduce the concept of how to carry out a functional analysis of a subset of differentially expressed (DE) genes, by means of assessing how significantly DE genes enrich gene sets of our interest.

First, we are going to explore the basic concept of enriching a gene set with differentially expressed (DE) genes. Recall the differential expression analysis.

R

library(SummarizedExperiment)
library(DESeq2)

R

se <- readRDS("data/GSE96870_se.rds")

R

dds <- DESeq2::DESeqDataSet(se[, se$tissue == "Cerebellum"],
                            design = ~ sex + time)

WARNING

Warning in DESeq2::DESeqDataSet(se[, se$tissue == "Cerebellum"], design = ~sex
+ : some variables in design formula are characters, converting to factors

R

dds <- DESeq2::DESeq(dds)

Fetch results for the contrast between male and female mice.

R

resSex <- DESeq2::results(dds, contrast = c("sex", "Male", "Female"))
summary(resSex)

OUTPUT


out of 32652 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 53, 0.16%
LFC < 0 (down)     : 71, 0.22%
outliers [1]       : 10, 0.031%
low counts [2]     : 13717, 42%
(mean count < 6)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Select DE genes between males and females with FDR < 5%.

R

sexDE <- as.data.frame(subset(resSex, padj < 0.05))
dim(sexDE)

OUTPUT

[1] 54  6

R

sexDE <- sexDE[order(abs(sexDE$log2FoldChange), decreasing=TRUE), ]
head(sexDE)

OUTPUT

               baseMean log2FoldChange     lfcSE      stat        pvalue
Eif2s3y       1410.8750       12.62514 0.5652155  22.33685 1.620659e-110
Kdm5d          692.1672       12.55386 0.5936267  21.14773  2.895664e-99
Uty            667.4375       12.01728 0.5935911  20.24505  3.927797e-91
Ddx3y         2072.9436       11.87241 0.3974927  29.86825 5.087220e-196
Xist         22603.0359      -11.60429 0.3362822 -34.50761 6.168523e-261
LOC105243748    52.9669        9.08325 0.5976242  15.19893  3.594320e-52
                      padj
Eif2s3y      1.022366e-106
Kdm5d         1.370011e-95
Uty           1.486671e-87
Ddx3y        4.813782e-192
Xist         1.167393e-256
LOC105243748  1.133708e-48

R

sexDEgenes <- rownames(sexDE)
head(sexDEgenes)

OUTPUT

[1] "Eif2s3y"      "Kdm5d"        "Uty"          "Ddx3y"        "Xist"        
[6] "LOC105243748"

R

length(sexDEgenes)

OUTPUT

[1] 54

Enrichment of a curated gene set


Contribute!

Here we illustrate how to assess the enrichment of one gene set we curate ourselves with our subset of DE genes with sex-specific expression. Here we form such a gene set with genes from sex chromosomes. Could you think of another more accurate gene set formed by genes with sex-specific expression?

Build a gene set formed by genes located in the sex chromosomes X and Y.

R

xygenes <- rownames(se)[decode(seqnames(rowRanges(se)) %in% c("X", "Y"))]
length(xygenes)

OUTPUT

[1] 2123

Build a contingency table and conduct a one-tailed Fisher’s exact test that verifies the association between genes being DE between male and female mice and being located in a sex chromosome.

R

N <- nrow(se)
n <- length(sexDEgenes)
m <- length(xygenes)
k <- length(intersect(xygenes, sexDEgenes)) 
dnames <- list(GS=c("inside", "outside"), DE=c("yes", "no"))
t <- matrix(c(k, n-k, m-k, N+k-n-m),
                        nrow=2, ncol=2, dimnames=dnames)
t

OUTPUT

         DE
GS        yes    no
  inside   18  2105
  outside  36 39627

R

fisher.test(t, alternative="greater")

OUTPUT


	Fisher's Exact Test for Count Data

data:  t
p-value = 7.944e-11
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
 5.541517      Inf
sample estimates:
odds ratio 
  9.411737 

Gene ontology analysis with clusterProfiler


Contribute!

Here we illustrate how to assess the enrichment on the entire collection of Gene Ontology (GO) gene sets using the package clusterProfiler. Could we illustrate any missing important feature of this package for this analysis objective? Could we briefly mention other packages that may be useful for this task?

Second, let’s perform a gene set analysis for an entire collection of gene sets using the Bioconductor package clusterProfiler. For this purpose, we will fetch the results for the contrast between two time points.

R

resTime <- DESeq2::results(dds, contrast = c("time", "Day8", "Day0"))
summary(resTime)

OUTPUT


out of 32652 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 4472, 14%
LFC < 0 (down)     : 4276, 13%
outliers [1]       : 10, 0.031%
low counts [2]     : 8732, 27%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Select DE genes between Day0 and Day8 with FDR < 5% and minimum 1.5-fold change.

R

timeDE <- as.data.frame(subset(resTime, padj < 0.05 & abs(log2FoldChange) > log2(1.5)))
dim(timeDE)

OUTPUT

[1] 2110    6

R

timeDE <- timeDE[order(abs(timeDE$log2FoldChange), decreasing=TRUE), ]
head(timeDE)

OUTPUT

              baseMean log2FoldChange     lfcSE      stat       pvalue
LOC105245444  2.441873       4.768938 0.9013067  5.291138 1.215573e-07
LOC105246405  9.728219       4.601505 0.6101832  7.541186 4.657174e-14
4933427D06Rik 1.480365       4.556126 1.0318402  4.415535 1.007607e-05
A930006I01Rik 2.312732      -4.353155 0.9176026 -4.744053 2.094837e-06
LOC105245223  3.272536       4.337202 0.8611255  5.036666 4.737099e-07
A530053G22Rik 1.554735       4.243903 1.0248977  4.140806 3.460875e-05
                      padj
LOC105245444  1.800765e-06
LOC105246405  2.507951e-12
4933427D06Rik 9.169093e-05
A930006I01Rik 2.252139e-05
LOC105245223  6.047199e-06
A530053G22Rik 2.720142e-04

R

timeDEgenes <- rownames(timeDE)
head(timeDEgenes)

OUTPUT

[1] "LOC105245444"  "LOC105246405"  "4933427D06Rik" "A930006I01Rik"
[5] "LOC105245223"  "A530053G22Rik"

R

length(timeDEgenes)

OUTPUT

[1] 2110

Call the enrichGO() function from clusterProfiler as follows.

R

library(clusterProfiler)
library(org.Mm.eg.db)
library(enrichplot)

resTimeGO <- enrichGO(gene = timeDEgenes,
                      keyType = "SYMBOL",
                      universe = rownames(se),
                      OrgDb = org.Mm.eg.db,
                      ont = "BP",
                      pvalueCutoff = 0.01,
                      qvalueCutoff = 0.01)
dim(resTimeGO)

OUTPUT

[1] 17  9

R

head(resTimeGO)

OUTPUT

                   ID                                           Description
GO:0035456 GO:0035456                           response to interferon-beta
GO:0071674 GO:0071674                            mononuclear cell migration
GO:0035458 GO:0035458                  cellular response to interferon-beta
GO:0050900 GO:0050900                                   leukocyte migration
GO:0030595 GO:0030595                                  leukocyte chemotaxis
GO:0002523 GO:0002523 leukocyte migration involved in inflammatory response
           GeneRatio   BgRatio       pvalue     p.adjust       qvalue
GO:0035456   17/1260  54/21417 6.434058e-09 3.254347e-05 3.082930e-05
GO:0071674   32/1260 179/21417 1.596474e-08 4.037482e-05 3.824814e-05
GO:0035458   15/1260  47/21417 4.064983e-08 6.749342e-05 6.393832e-05
GO:0050900   50/1260 375/21417 5.337558e-08 6.749342e-05 6.393832e-05
GO:0030595   34/1260 222/21417 2.880367e-07 2.913780e-04 2.760302e-04
GO:0002523   10/1260  25/21417 6.944905e-07 5.854555e-04 5.546176e-04
                                                                                                                                                                                                                                                                                             geneID
GO:0035456                                                                                                                                                                            Tgtp1/Tgtp2/F830016B08Rik/Iigp1/Ifitm6/Igtp/Gm4951/Bst2/Oas1c/Irgm1/Gbp6/Ifi47/Aim2/Ifitm7/Irgm2/Ifit1/Ifi204
GO:0071674                                                                                                    Tnfsf18/Aire/Ccl17/Ccr7/Nlrp12/Ccl2/Retnlg/Apod/Il12a/Ccl5/Fut7/Ccl7/Spn/Itgb3/Grem1/Ptk2b/Lgals3/Adam8/Dusp1/Ch25h/Nbl1/Alox5/Padi2/Plg/Calr/Ager/Slamf9/Ccl6/Mdk/Itga4/Hsd3b7/Trpm4
GO:0035458                                                                                                                                                                                        Tgtp1/Tgtp2/F830016B08Rik/Iigp1/Igtp/Gm4951/Oas1c/Irgm1/Gbp6/Ifi47/Aim2/Ifitm7/Irgm2/Ifit1/Ifi204
GO:0050900 Tnfsf18/Aire/Ccl17/Ccr7/Nlrp12/Bst1/Ccl2/Retnlg/Ppbp/Cxcl5/Apod/Il12a/Ccl5/Umodl1/Fut7/Ccl7/Ccl28/Spn/Sell/Itgb3/Grem1/Cxcl1/Ptk2b/Lgals3/Adam8/Pf4/Dusp1/Ch25h/S100a8/Nbl1/Alox5/Padi2/Plg/Edn3/Il33/Ptn/Ada/Enpp1/Calr/Ager/Slamf9/Ccl6/Prex1/Aoc3/Itgam/Mdk/Itga4/Hsd3b7/P2ry12/Trpm4
GO:0030595                                                                                        Tnfsf18/Ccl17/Ccr7/Bst1/Ccl2/Retnlg/Ppbp/Cxcl5/Il12a/Ccl5/Ccl7/Sell/Grem1/Cxcl1/Ptk2b/Lgals3/Adam8/Pf4/Dusp1/Ch25h/S100a8/Nbl1/Alox5/Padi2/Edn3/Ptn/Calr/Slamf9/Ccl6/Prex1/Itgam/Mdk/Hsd3b7/Trpm4
GO:0002523                                                                                                                                                                                                                                     Ccl2/Ppbp/Fut7/Adam8/S100a8/Alox5/Ptn/Aoc3/Itgam/Mdk
           Count
GO:0035456    17
GO:0071674    32
GO:0035458    15
GO:0050900    50
GO:0030595    34
GO:0002523    10

Let’s build a more readable table of results.

R

library(kableExtra)

resTimeGOtab <- as.data.frame(resTimeGO)
resTimeGOtab$ID <- NULL
resTimeGOtab$geneID <- sapply(strsplit(resTimeGO$geneID, "/"), paste, collapse=", ")
ktab <- kable(resTimeGOtab, row.names=TRUE, caption="GO results for DE genes between time points.")
kable_styling(ktab, bootstrap_options=c("stripped", "hover", "responsive"), fixed_thead=TRUE)
GO results for DE genes between time points.
Description GeneRatio BgRatio pvalue p.adjust qvalue geneID Count
GO:0035456 response to interferon-beta 17/1260 54/21417 0.00e+00 0.0000325 0.0000308 Tgtp1, Tgtp2, F830016B08Rik, Iigp1, Ifitm6, Igtp, Gm4951, Bst2, Oas1c, Irgm1, Gbp6, Ifi47, Aim2, Ifitm7, Irgm2, Ifit1, Ifi204 17
GO:0071674 mononuclear cell migration 32/1260 179/21417 0.00e+00 0.0000404 0.0000382 Tnfsf18, Aire, Ccl17, Ccr7, Nlrp12, Ccl2, Retnlg, Apod, Il12a, Ccl5, Fut7, Ccl7, Spn, Itgb3, Grem1, Ptk2b, Lgals3, Adam8, Dusp1, Ch25h, Nbl1, Alox5, Padi2, Plg, Calr, Ager, Slamf9, Ccl6, Mdk, Itga4, Hsd3b7, Trpm4 32
GO:0035458 cellular response to interferon-beta 15/1260 47/21417 0.00e+00 0.0000675 0.0000639 Tgtp1, Tgtp2, F830016B08Rik, Iigp1, Igtp, Gm4951, Oas1c, Irgm1, Gbp6, Ifi47, Aim2, Ifitm7, Irgm2, Ifit1, Ifi204 15
GO:0050900 leukocyte migration 50/1260 375/21417 1.00e-07 0.0000675 0.0000639 Tnfsf18, Aire, Ccl17, Ccr7, Nlrp12, Bst1, Ccl2, Retnlg, Ppbp, Cxcl5, Apod, Il12a, Ccl5, Umodl1, Fut7, Ccl7, Ccl28, Spn, Sell, Itgb3, Grem1, Cxcl1, Ptk2b, Lgals3, Adam8, Pf4, Dusp1, Ch25h, S100a8, Nbl1, Alox5, Padi2, Plg, Edn3, Il33, Ptn, Ada, Enpp1, Calr, Ager, Slamf9, Ccl6, Prex1, Aoc3, Itgam, Mdk, Itga4, Hsd3b7, P2ry12, Trpm4 50
GO:0030595 leukocyte chemotaxis 34/1260 222/21417 3.00e-07 0.0002914 0.0002760 Tnfsf18, Ccl17, Ccr7, Bst1, Ccl2, Retnlg, Ppbp, Cxcl5, Il12a, Ccl5, Ccl7, Sell, Grem1, Cxcl1, Ptk2b, Lgals3, Adam8, Pf4, Dusp1, Ch25h, S100a8, Nbl1, Alox5, Padi2, Edn3, Ptn, Calr, Slamf9, Ccl6, Prex1, Itgam, Mdk, Hsd3b7, Trpm4 34
GO:0002523 leukocyte migration involved in inflammatory response 10/1260 25/21417 7.00e-07 0.0005855 0.0005546 Ccl2, Ppbp, Fut7, Adam8, S100a8, Alox5, Ptn, Aoc3, Itgam, Mdk 10
GO:0050953 sensory perception of light stimulus 26/1260 162/21417 2.90e-06 0.0020909 0.0019808 Aipl1, Vsx2, Nxnl2, Lrit3, Cryba2, Bfsp2, Lrat, Gabrr2, Lum, Rlbp1, Pde6g, Gpr179, Col1a1, Cplx3, Best1, Ush1g, Rs1, Rdh5, Guca1b, Th, Ppef2, Rbp4, Olfm2, Rom1, Vsx1, Rpe65 26
GO:0071675 regulation of mononuclear cell migration 21/1260 117/21417 4.20e-06 0.0026851 0.0025436 Tnfsf18, Aire, Ccr7, Ccl2, Apod, Il12a, Ccl5, Ccl7, Spn, Itgb3, Grem1, Ptk2b, Lgals3, Adam8, Dusp1, Nbl1, Padi2, Calr, Ager, Mdk, Itga4 21
GO:0007601 visual perception 25/1260 158/21417 5.80e-06 0.0032508 0.0030795 Aipl1, Vsx2, Nxnl2, Lrit3, Cryba2, Bfsp2, Lrat, Gabrr2, Lum, Rlbp1, Pde6g, Gpr179, Col1a1, Cplx3, Best1, Rs1, Rdh5, Guca1b, Th, Ppef2, Rbp4, Olfm2, Rom1, Vsx1, Rpe65 25
GO:0034341 response to interferon-gamma 22/1260 135/21417 1.28e-05 0.0056936 0.0053937 Ccl17, Gbp4, Ccl2, Tgtp1, H2-Q7, Il12rb1, Ifitm6, Ccl5, Ccl7, Igtp, Nos2, Nlrc5, Bst2, Irgm1, Gbp6, Capg, Ifitm7, Gbp9, Gbp5, Irgm2, Ccl6, Aqp4 22
GO:0060326 cell chemotaxis 38/1260 308/21417 1.30e-05 0.0056936 0.0053937 Tnfsf18, Ccl17, Ccr7, Bst1, Ccl2, Retnlg, Ppbp, Cxcl5, Nr4a1, Il12a, Ccl5, Ccl7, Ccl28, Sell, Grem1, Cxcl1, Ptk2b, Lgals3, Adam8, Pf4, Dusp1, Ch25h, S100a8, Nbl1, Alox5, Padi2, Edn3, Ptn, Plxnb3, Calr, Lpar1, Slamf9, Ccl6, Prex1, Itgam, Mdk, Hsd3b7, Trpm4 38
GO:0002685 regulation of leukocyte migration 31/1260 230/21417 1.41e-05 0.0056936 0.0053937 Tnfsf18, Aire, Ccr7, Bst1, Ccl2, Apod, Il12a, Ccl5, Fut7, Ccl7, Ccl28, Spn, Sell, Itgb3, Grem1, Ptk2b, Lgals3, Adam8, Dusp1, Nbl1, Padi2, Edn3, Il33, Ptn, Ada, Calr, Ager, Aoc3, Mdk, Itga4, P2ry12 31
GO:0036336 dendritic cell migration 9/1260 27/21417 1.46e-05 0.0056936 0.0053937 Tnfsf18, Ccr7, Nlrp12, Retnlg, Il12a, Alox5, Calr, Slamf9, Trpm4 9
GO:1990266 neutrophil migration 21/1260 127/21417 1.59e-05 0.0057426 0.0054401 Ccl17, Ccr7, Bst1, Ccl2, Ppbp, Cxcl5, Ccl5, Umodl1, Fut7, Ccl7, Sell, Cxcl1, Lgals3, Adam8, Pf4, S100a8, Edn3, Ccl6, Prex1, Itgam, Mdk 21
GO:0016264 gap junction assembly 7/1260 16/21417 1.71e-05 0.0057811 0.0054766 Gjd3, Aplnr, Gja5, Agt, Hopx, Cav1, Ace 7
GO:0030593 neutrophil chemotaxis 18/1260 100/21417 1.94e-05 0.0061338 0.0058107 Ccl17, Ccr7, Bst1, Ccl2, Ppbp, Cxcl5, Ccl5, Ccl7, Sell, Cxcl1, Lgals3, Pf4, S100a8, Edn3, Ccl6, Prex1, Itgam, Mdk 18
GO:1903028 positive regulation of opsonization 6/1260 12/21417 2.78e-05 0.0082825 0.0078462 Pla2g5, Masp2, C4b, Colec11, C3, Cfp 6

Keypoints

  • Key point 1