Gene set analysis
Last updated on 2023-03-14 | Edit this page
Overview
Questions
- How do we find differentially expressed pathways?
Objectives
- Explain how to find differentially expressed pathways with gene set analysis in R.
- Understand how differentially expressed genes can enrich a gene set.
- Explain how to perform a gene set analysis in R, using clusterProfiler.
First, we are going to explore the basic concept of enriching a gene set with differentially expressed (DE) genes. Recall the differential expression analysis.
R
library(SummarizedExperiment)
library(DESeq2)
R
se <- readRDS("data/GSE96870_se.rds")
R
dds <- DESeq2::DESeqDataSet(se[, se$tissue == "Cerebellum"],
design = ~ sex + time)
WARNING
Warning in DESeq2::DESeqDataSet(se[, se$tissue == "Cerebellum"], design = ~sex
+ : some variables in design formula are characters, converting to factors
R
dds <- DESeq2::DESeq(dds)
Fetch results for the contrast between male and female mice.
R
resSex <- DESeq2::results(dds, contrast = c("sex", "Male", "Female"))
summary(resSex)
OUTPUT
out of 32652 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 53, 0.16%
LFC < 0 (down) : 71, 0.22%
outliers [1] : 10, 0.031%
low counts [2] : 13717, 42%
(mean count < 6)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
Select DE genes between males and females with FDR < 5%.
R
sexDE <- as.data.frame(subset(resSex, padj < 0.05))
dim(sexDE)
OUTPUT
[1] 54 6
R
sexDE <- sexDE[order(abs(sexDE$log2FoldChange), decreasing=TRUE), ]
head(sexDE)
OUTPUT
baseMean log2FoldChange lfcSE stat pvalue
Eif2s3y 1410.8750 12.62514 0.5652155 22.33685 1.620659e-110
Kdm5d 692.1672 12.55386 0.5936267 21.14773 2.895664e-99
Uty 667.4375 12.01728 0.5935911 20.24505 3.927797e-91
Ddx3y 2072.9436 11.87241 0.3974927 29.86825 5.087220e-196
Xist 22603.0359 -11.60429 0.3362822 -34.50761 6.168523e-261
LOC105243748 52.9669 9.08325 0.5976242 15.19893 3.594320e-52
padj
Eif2s3y 1.022366e-106
Kdm5d 1.370011e-95
Uty 1.486671e-87
Ddx3y 4.813782e-192
Xist 1.167393e-256
LOC105243748 1.133708e-48
R
sexDEgenes <- rownames(sexDE)
head(sexDEgenes)
OUTPUT
[1] "Eif2s3y" "Kdm5d" "Uty" "Ddx3y" "Xist"
[6] "LOC105243748"
R
length(sexDEgenes)
OUTPUT
[1] 54
Enrichment of a curated gene set
Contribute!
Here we illustrate how to assess the enrichment of one gene set we curate ourselves with our subset of DE genes with sex-specific expression. Here we form such a gene set with genes from sex chromosomes. Could you think of another more accurate gene set formed by genes with sex-specific expression?
Build a gene set formed by genes located in the sex chromosomes X and Y.
R
xygenes <- rownames(se)[decode(seqnames(rowRanges(se)) %in% c("X", "Y"))]
length(xygenes)
OUTPUT
[1] 2123
Build a contingency table and conduct a one-tailed Fisher’s exact test that verifies the association between genes being DE between male and female mice and being located in a sex chromosome.
R
N <- nrow(se)
n <- length(sexDEgenes)
m <- length(xygenes)
k <- length(intersect(xygenes, sexDEgenes))
dnames <- list(GS=c("inside", "outside"), DE=c("yes", "no"))
t <- matrix(c(k, n-k, m-k, N+k-n-m),
nrow=2, ncol=2, dimnames=dnames)
t
OUTPUT
DE
GS yes no
inside 18 2105
outside 36 39627
R
fisher.test(t, alternative="greater")
OUTPUT
Fisher's Exact Test for Count Data
data: t
p-value = 7.944e-11
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
5.541517 Inf
sample estimates:
odds ratio
9.411737
Gene ontology analysis with clusterProfiler
Contribute!
Here we illustrate how to assess the enrichment on the entire collection of Gene Ontology (GO) gene sets using the package clusterProfiler. Could we illustrate any missing important feature of this package for this analysis objective? Could we briefly mention other packages that may be useful for this task?
Second, let’s perform a gene set analysis for an entire collection of gene sets using the Bioconductor package clusterProfiler. For this purpose, we will fetch the results for the contrast between two time points.
R
resTime <- DESeq2::results(dds, contrast = c("time", "Day8", "Day0"))
summary(resTime)
OUTPUT
out of 32652 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 4472, 14%
LFC < 0 (down) : 4276, 13%
outliers [1] : 10, 0.031%
low counts [2] : 8732, 27%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
Select DE genes between Day0
and Day8
with
FDR < 5% and minimum 1.5-fold change.
R
timeDE <- as.data.frame(subset(resTime, padj < 0.05 & abs(log2FoldChange) > log2(1.5)))
dim(timeDE)
OUTPUT
[1] 2110 6
R
timeDE <- timeDE[order(abs(timeDE$log2FoldChange), decreasing=TRUE), ]
head(timeDE)
OUTPUT
baseMean log2FoldChange lfcSE stat pvalue
LOC105245444 2.441873 4.768938 0.9013067 5.291138 1.215573e-07
LOC105246405 9.728219 4.601505 0.6101832 7.541186 4.657174e-14
4933427D06Rik 1.480365 4.556126 1.0318402 4.415535 1.007607e-05
A930006I01Rik 2.312732 -4.353155 0.9176026 -4.744053 2.094837e-06
LOC105245223 3.272536 4.337202 0.8611255 5.036666 4.737099e-07
A530053G22Rik 1.554735 4.243903 1.0248977 4.140806 3.460875e-05
padj
LOC105245444 1.800765e-06
LOC105246405 2.507951e-12
4933427D06Rik 9.169093e-05
A930006I01Rik 2.252139e-05
LOC105245223 6.047199e-06
A530053G22Rik 2.720142e-04
R
timeDEgenes <- rownames(timeDE)
head(timeDEgenes)
OUTPUT
[1] "LOC105245444" "LOC105246405" "4933427D06Rik" "A930006I01Rik"
[5] "LOC105245223" "A530053G22Rik"
R
length(timeDEgenes)
OUTPUT
[1] 2110
Call the enrichGO()
function from clusterProfiler
as follows.
R
library(clusterProfiler)
library(org.Mm.eg.db)
library(enrichplot)
resTimeGO <- enrichGO(gene = timeDEgenes,
keyType = "SYMBOL",
universe = rownames(se),
OrgDb = org.Mm.eg.db,
ont = "BP",
pvalueCutoff = 0.01,
qvalueCutoff = 0.01)
dim(resTimeGO)
OUTPUT
[1] 17 9
R
head(resTimeGO)
OUTPUT
ID Description
GO:0035456 GO:0035456 response to interferon-beta
GO:0071674 GO:0071674 mononuclear cell migration
GO:0035458 GO:0035458 cellular response to interferon-beta
GO:0050900 GO:0050900 leukocyte migration
GO:0030595 GO:0030595 leukocyte chemotaxis
GO:0002523 GO:0002523 leukocyte migration involved in inflammatory response
GeneRatio BgRatio pvalue p.adjust qvalue
GO:0035456 17/1260 54/21417 6.434058e-09 3.254347e-05 3.082930e-05
GO:0071674 32/1260 179/21417 1.596474e-08 4.037482e-05 3.824814e-05
GO:0035458 15/1260 47/21417 4.064983e-08 6.749342e-05 6.393832e-05
GO:0050900 50/1260 375/21417 5.337558e-08 6.749342e-05 6.393832e-05
GO:0030595 34/1260 222/21417 2.880367e-07 2.913780e-04 2.760302e-04
GO:0002523 10/1260 25/21417 6.944905e-07 5.854555e-04 5.546176e-04
geneID
GO:0035456 Tgtp1/Tgtp2/F830016B08Rik/Iigp1/Ifitm6/Igtp/Gm4951/Bst2/Oas1c/Irgm1/Gbp6/Ifi47/Aim2/Ifitm7/Irgm2/Ifit1/Ifi204
GO:0071674 Tnfsf18/Aire/Ccl17/Ccr7/Nlrp12/Ccl2/Retnlg/Apod/Il12a/Ccl5/Fut7/Ccl7/Spn/Itgb3/Grem1/Ptk2b/Lgals3/Adam8/Dusp1/Ch25h/Nbl1/Alox5/Padi2/Plg/Calr/Ager/Slamf9/Ccl6/Mdk/Itga4/Hsd3b7/Trpm4
GO:0035458 Tgtp1/Tgtp2/F830016B08Rik/Iigp1/Igtp/Gm4951/Oas1c/Irgm1/Gbp6/Ifi47/Aim2/Ifitm7/Irgm2/Ifit1/Ifi204
GO:0050900 Tnfsf18/Aire/Ccl17/Ccr7/Nlrp12/Bst1/Ccl2/Retnlg/Ppbp/Cxcl5/Apod/Il12a/Ccl5/Umodl1/Fut7/Ccl7/Ccl28/Spn/Sell/Itgb3/Grem1/Cxcl1/Ptk2b/Lgals3/Adam8/Pf4/Dusp1/Ch25h/S100a8/Nbl1/Alox5/Padi2/Plg/Edn3/Il33/Ptn/Ada/Enpp1/Calr/Ager/Slamf9/Ccl6/Prex1/Aoc3/Itgam/Mdk/Itga4/Hsd3b7/P2ry12/Trpm4
GO:0030595 Tnfsf18/Ccl17/Ccr7/Bst1/Ccl2/Retnlg/Ppbp/Cxcl5/Il12a/Ccl5/Ccl7/Sell/Grem1/Cxcl1/Ptk2b/Lgals3/Adam8/Pf4/Dusp1/Ch25h/S100a8/Nbl1/Alox5/Padi2/Edn3/Ptn/Calr/Slamf9/Ccl6/Prex1/Itgam/Mdk/Hsd3b7/Trpm4
GO:0002523 Ccl2/Ppbp/Fut7/Adam8/S100a8/Alox5/Ptn/Aoc3/Itgam/Mdk
Count
GO:0035456 17
GO:0071674 32
GO:0035458 15
GO:0050900 50
GO:0030595 34
GO:0002523 10
Let’s build a more readable table of results.
R
library(kableExtra)
resTimeGOtab <- as.data.frame(resTimeGO)
resTimeGOtab$ID <- NULL
resTimeGOtab$geneID <- sapply(strsplit(resTimeGO$geneID, "/"), paste, collapse=", ")
ktab <- kable(resTimeGOtab, row.names=TRUE, caption="GO results for DE genes between time points.")
kable_styling(ktab, bootstrap_options=c("stripped", "hover", "responsive"), fixed_thead=TRUE)
Description | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | geneID | Count | |
---|---|---|---|---|---|---|---|---|
GO:0035456 | response to interferon-beta | 17/1260 | 54/21417 | 0.00e+00 | 0.0000325 | 0.0000308 | Tgtp1, Tgtp2, F830016B08Rik, Iigp1, Ifitm6, Igtp, Gm4951, Bst2, Oas1c, Irgm1, Gbp6, Ifi47, Aim2, Ifitm7, Irgm2, Ifit1, Ifi204 | 17 |
GO:0071674 | mononuclear cell migration | 32/1260 | 179/21417 | 0.00e+00 | 0.0000404 | 0.0000382 | Tnfsf18, Aire, Ccl17, Ccr7, Nlrp12, Ccl2, Retnlg, Apod, Il12a, Ccl5, Fut7, Ccl7, Spn, Itgb3, Grem1, Ptk2b, Lgals3, Adam8, Dusp1, Ch25h, Nbl1, Alox5, Padi2, Plg, Calr, Ager, Slamf9, Ccl6, Mdk, Itga4, Hsd3b7, Trpm4 | 32 |
GO:0035458 | cellular response to interferon-beta | 15/1260 | 47/21417 | 0.00e+00 | 0.0000675 | 0.0000639 | Tgtp1, Tgtp2, F830016B08Rik, Iigp1, Igtp, Gm4951, Oas1c, Irgm1, Gbp6, Ifi47, Aim2, Ifitm7, Irgm2, Ifit1, Ifi204 | 15 |
GO:0050900 | leukocyte migration | 50/1260 | 375/21417 | 1.00e-07 | 0.0000675 | 0.0000639 | Tnfsf18, Aire, Ccl17, Ccr7, Nlrp12, Bst1, Ccl2, Retnlg, Ppbp, Cxcl5, Apod, Il12a, Ccl5, Umodl1, Fut7, Ccl7, Ccl28, Spn, Sell, Itgb3, Grem1, Cxcl1, Ptk2b, Lgals3, Adam8, Pf4, Dusp1, Ch25h, S100a8, Nbl1, Alox5, Padi2, Plg, Edn3, Il33, Ptn, Ada, Enpp1, Calr, Ager, Slamf9, Ccl6, Prex1, Aoc3, Itgam, Mdk, Itga4, Hsd3b7, P2ry12, Trpm4 | 50 |
GO:0030595 | leukocyte chemotaxis | 34/1260 | 222/21417 | 3.00e-07 | 0.0002914 | 0.0002760 | Tnfsf18, Ccl17, Ccr7, Bst1, Ccl2, Retnlg, Ppbp, Cxcl5, Il12a, Ccl5, Ccl7, Sell, Grem1, Cxcl1, Ptk2b, Lgals3, Adam8, Pf4, Dusp1, Ch25h, S100a8, Nbl1, Alox5, Padi2, Edn3, Ptn, Calr, Slamf9, Ccl6, Prex1, Itgam, Mdk, Hsd3b7, Trpm4 | 34 |
GO:0002523 | leukocyte migration involved in inflammatory response | 10/1260 | 25/21417 | 7.00e-07 | 0.0005855 | 0.0005546 | Ccl2, Ppbp, Fut7, Adam8, S100a8, Alox5, Ptn, Aoc3, Itgam, Mdk | 10 |
GO:0050953 | sensory perception of light stimulus | 26/1260 | 162/21417 | 2.90e-06 | 0.0020909 | 0.0019808 | Aipl1, Vsx2, Nxnl2, Lrit3, Cryba2, Bfsp2, Lrat, Gabrr2, Lum, Rlbp1, Pde6g, Gpr179, Col1a1, Cplx3, Best1, Ush1g, Rs1, Rdh5, Guca1b, Th, Ppef2, Rbp4, Olfm2, Rom1, Vsx1, Rpe65 | 26 |
GO:0071675 | regulation of mononuclear cell migration | 21/1260 | 117/21417 | 4.20e-06 | 0.0026851 | 0.0025436 | Tnfsf18, Aire, Ccr7, Ccl2, Apod, Il12a, Ccl5, Ccl7, Spn, Itgb3, Grem1, Ptk2b, Lgals3, Adam8, Dusp1, Nbl1, Padi2, Calr, Ager, Mdk, Itga4 | 21 |
GO:0007601 | visual perception | 25/1260 | 158/21417 | 5.80e-06 | 0.0032508 | 0.0030795 | Aipl1, Vsx2, Nxnl2, Lrit3, Cryba2, Bfsp2, Lrat, Gabrr2, Lum, Rlbp1, Pde6g, Gpr179, Col1a1, Cplx3, Best1, Rs1, Rdh5, Guca1b, Th, Ppef2, Rbp4, Olfm2, Rom1, Vsx1, Rpe65 | 25 |
GO:0034341 | response to interferon-gamma | 22/1260 | 135/21417 | 1.28e-05 | 0.0056936 | 0.0053937 | Ccl17, Gbp4, Ccl2, Tgtp1, H2-Q7, Il12rb1, Ifitm6, Ccl5, Ccl7, Igtp, Nos2, Nlrc5, Bst2, Irgm1, Gbp6, Capg, Ifitm7, Gbp9, Gbp5, Irgm2, Ccl6, Aqp4 | 22 |
GO:0060326 | cell chemotaxis | 38/1260 | 308/21417 | 1.30e-05 | 0.0056936 | 0.0053937 | Tnfsf18, Ccl17, Ccr7, Bst1, Ccl2, Retnlg, Ppbp, Cxcl5, Nr4a1, Il12a, Ccl5, Ccl7, Ccl28, Sell, Grem1, Cxcl1, Ptk2b, Lgals3, Adam8, Pf4, Dusp1, Ch25h, S100a8, Nbl1, Alox5, Padi2, Edn3, Ptn, Plxnb3, Calr, Lpar1, Slamf9, Ccl6, Prex1, Itgam, Mdk, Hsd3b7, Trpm4 | 38 |
GO:0002685 | regulation of leukocyte migration | 31/1260 | 230/21417 | 1.41e-05 | 0.0056936 | 0.0053937 | Tnfsf18, Aire, Ccr7, Bst1, Ccl2, Apod, Il12a, Ccl5, Fut7, Ccl7, Ccl28, Spn, Sell, Itgb3, Grem1, Ptk2b, Lgals3, Adam8, Dusp1, Nbl1, Padi2, Edn3, Il33, Ptn, Ada, Calr, Ager, Aoc3, Mdk, Itga4, P2ry12 | 31 |
GO:0036336 | dendritic cell migration | 9/1260 | 27/21417 | 1.46e-05 | 0.0056936 | 0.0053937 | Tnfsf18, Ccr7, Nlrp12, Retnlg, Il12a, Alox5, Calr, Slamf9, Trpm4 | 9 |
GO:1990266 | neutrophil migration | 21/1260 | 127/21417 | 1.59e-05 | 0.0057426 | 0.0054401 | Ccl17, Ccr7, Bst1, Ccl2, Ppbp, Cxcl5, Ccl5, Umodl1, Fut7, Ccl7, Sell, Cxcl1, Lgals3, Adam8, Pf4, S100a8, Edn3, Ccl6, Prex1, Itgam, Mdk | 21 |
GO:0016264 | gap junction assembly | 7/1260 | 16/21417 | 1.71e-05 | 0.0057811 | 0.0054766 | Gjd3, Aplnr, Gja5, Agt, Hopx, Cav1, Ace | 7 |
GO:0030593 | neutrophil chemotaxis | 18/1260 | 100/21417 | 1.94e-05 | 0.0061338 | 0.0058107 | Ccl17, Ccr7, Bst1, Ccl2, Ppbp, Cxcl5, Ccl5, Ccl7, Sell, Cxcl1, Lgals3, Pf4, S100a8, Edn3, Ccl6, Prex1, Itgam, Mdk | 18 |
GO:1903028 | positive regulation of opsonization | 6/1260 | 12/21417 | 2.78e-05 | 0.0082825 | 0.0078462 | Pla2g5, Masp2, C4b, Colec11, C3, Cfp | 6 |