The example data includes a pre-computed differential expression dataframe from limma. From the differential expression dataframe, we order genes descending so upregulated genes are near the top, then extract the gene symbol column as a vector.
data(limma)
signature <- limma %>%
dplyr::arrange(desc(t)) %>%
magrittr::use_series(symbol)
head(signature)
[1] "LMAN2L" "SHKBP1" "SPHK2" "AJUBA" "TJP1" "TMCC3"
genesets <- msigdb_gsets("Homo sapiens", "C2", "CP:REACTOME", clean=TRUE)
hyp_obj <- hypeR(signature, genesets, test="kstest", fdr=0.01)
To visualize the results, just pass the hyp
object to any downstream functions.
One can visualize the top enriched genesets using hyp_dots()
which returns a horizontal dots plot. Each dot is a geneset, where the color represents the significance and the size signifies the geneset size.
hyp_dots(hyp_obj)
One can visualize the top enriched genesets using hyp_emap()
which will return an enrichment map. Each node represents a geneset, where the shade of red indicates the normalized significance of enrichment. Hover over the node to view the raw value. Edges represent geneset similarity, calculated by either jaccard or overlap similarity metrics.
hyp_emap(hyp_obj, similarity_cutoff=0.70)
When dealing with hundreds of genesets, it’s often useful to understand the relationships between them. This allows researchers to summarize many enriched pathways as more general biological processes. To do this, we rely on curated relationships defined between them.
For example, Reactome conveniently defines their genesets in a hiearchy of pathways. This data can be formatted into a relational genesets object called rgsets
.
genesets <- hyperdb_rgsets("REACTOME", version="70.0")
Relational genesets have three data atrributes including gsets, nodes, and edges. The genesets
attribute includes the geneset information for the leaf nodes of the hiearchy, the nodes
attribute describes all nodes in the hierarchy, including internal nodes, and the edges
attribute describes the edges in the hiearchy.
print(genesets)
REACTOME v70.0
Genesets
2-LTR circle formation (13)
5-Phosphoribose 1-diphosphate biosynthesis (3)
A tetrasaccharide linker sequence is required for GAG synthesis (26)
ABC transporters in lipid homeostasis (18)
ABO blood group biosynthesis (3)
ADP signalling through P2Y purinoceptor 1 (25)
Nodes
label
R-HSA-164843 2-LTR circle formation
R-HSA-73843 5-Phosphoribose 1-diphosphate biosynthesis
R-HSA-1971475 A tetrasaccharide linker sequence is required for GAG synthesis
R-HSA-5619084 ABC transporter disorders
R-HSA-1369062 ABC transporters in lipid homeostasis
R-HSA-382556 ABC-family proteins mediated transport
id length
R-HSA-164843 R-HSA-164843 13
R-HSA-73843 R-HSA-73843 3
R-HSA-1971475 R-HSA-1971475 26
R-HSA-5619084 R-HSA-5619084 77
R-HSA-1369062 R-HSA-1369062 18
R-HSA-382556 R-HSA-382556 22
Edges
from to
1 R-HSA-109581 R-HSA-109606
2 R-HSA-109581 R-HSA-169911
3 R-HSA-109581 R-HSA-5357769
4 R-HSA-109581 R-HSA-75153
5 R-HSA-109582 R-HSA-140877
6 R-HSA-109582 R-HSA-202733
Passing relational genesets works natively with hypeR()
.
hyp_obj <- hypeR(signature, genesets, test="kstest", fdr=0.01)
One can visualize the top enriched genesets using hyp_hmap()
which will return a hierarchy map. Each node represents a geneset, where the shade of the gold border indicates the normalized significance of enrichment. Hover over the leaf nodes to view the raw value. Double click internal nodes to cluster their first degree connections. Edges represent a directed relationship between genesets in the hiearchy. Note: This function only works when the hyp
object was initialized with an rgsets
object.
hyp_hmap(hyp_obj, top=30)