Creating annotated submodules of high-throughput data through recursive partitioning • K2Taxonomer

K2Taxonomer

Introduction

K2Taxonomer is an R package built around a “top-down” recursive partitioning framework to perform unsupervised learning of nested “taxonomy-like” subgroups from high-throughput -omics data. This framework was devised to be flexibly applicable to different data structures, supporting the analysis of both bulk and single-cell data sets. In addition to implementing the algorithm, the package includes functionalities to annotate estimated subgroups using gene- and pathway-level analyses.

The recursive partitioning approach utilized by K2Taxonomer presents advantages over conventional unsupervised approaches, including:

Identification of robust partitions of a set of observations by aggregating ensembles of partition estimates from repeated perturbations of the data.
Partition-specific feature selection, preventing the need to perform feature selection on the whole data set prior to running the algorithm.
Tailoring of analyses to specific data structures through the use of different clustering algorithms for partition estimation.

The package documentation describes applications of K2Taxonomer to both single-cell and bulk gene expression data. For analyses of single-cell gene expression data K2Taxonomer is designed to characterize nested subgroups of previously identified cell types, such as those previously estimated by scRNAseq clustering analysis.

Cite

Reed, Eric R, and Stefano Monti. “Multi-Resolution Characterization of Molecular Taxonomies in Bulk and Single-Cell Transcriptomics Data.” Nucleic Acids Research 49, no. 17 (September 27, 2021): e98. https://doi.org/10.1093/nar/gkab552.

Documentation

Articles describing K2Taxonomer workflows can be found on the package’s GitHub Page.

Requirements

R (>= 4.0)

Installation

You may install K2Taxonomer from GitHub directly using the devtools R package or clone the repository and download from source.

install.packages("devtools")
devtools::install_github("montilab/K2Taxonomer")

Usage

Load packages and read in gene expression data

library(K2Taxonomer)

Required data sets

K2Taxonomer requires two data inputs

An object comprising expression and metadata. This must be one of three object classes: ExpressionSet, Seurat, or SingleCellExperiment.
An object comprising a named list of gene signatures

Expression and metadata

The following example takes a seurat object as input, which includes the following

An “integrated” data slot which contains batch corrected scaled data used in Seurat clustering.
An “RNA” slot containing the un-integreated expression data
A column called “seurat_clusters”, which contains the cluster labels.

Gene sets

These objects are simply a named list of vectors containing gene identifiers. For example,

GENESETS <- list(
  GS1 = c("LYZ", "AIF1", "S100A11", "FCER1G", "SAT1", "LST1", "DUSP1", "S100A4", "CTSS", "SERPINA1"),
  GS2 = c("STMN1", "MYBL2", "HIST1H4C", "RPLP0", "RPSA", "TYMS", "NUSAP1", "HMGB1", "LDHB", "C12orf75")
)

Initialize K2Taxonomer

RNGkind("L'Ecuyer-CMRG")
set.seed(1)
K2res <- K2preproc(seu,
                   cohorts="seurat_clusters",
                   seuAssay = "integrated",
                   seuAssayDS = "RNA",
                   featMetric="F",
                   logCounts=TRUE,
                   clustFunc="cKmeansDownsampleSqrt",
                   useCors=8,
                   DGEmethod = "mast",
                   genesets = GENESETS,
                   ScoreGeneSetMethod = "AUCELL")

Run K2T algorithm

K2res <- K2tax(K2res)

Run differential expression analysis to identify markers of each partition

K2res <- runDGEmods(K2res)

Run gene set enrichment based on significantly differently expression genes

K2res <- runFISHERmods(K2res)

Run gene set scoring with specified algorithm

ScoreGeneSetMethod from K2preproc(). This can be either “AUCELL” or “GSVA”.

K2res <- runScoreGeneSets(K2res)

Run Difference gene set scoring

K2res <- runDSSEmods(K2res)

Create dashboard

K2dashboard(K2res)

Functions for results visualization

Plot dendrogram of K2tax() output

plot(K2dendro(K2res))

Create interactive dendrogram

K2visNetwork(K2res)

Create table of differential gene expression results

DGEtable <- getDGETable(K2res)

Create interactive table of differential gene expression results

getDGEInter(K2res, node = c("A"))

Create table of gene set results

ENRtable <- getEnrichmentTable(K2res)

Create interactive table of gene set results

getEnrichmentInter(K2res, nodes = c("A"))

Plot gene expression

plotGenePathway(K2res, feature = "LYZ", node = "A", use_plotly = FALSE)

Create interactive plot of gene expression

plotGenePathway(K2res, feature = "LYZ", node = "A")

Plot of single-sample gene sets scoring

plotGenePathway(K2res, feature = "GS1", node = "A", type = "gMat", use_plotly = FALSE)

Create interactive plot of single-sample gene sets scoring

plotGenePathway(K2res, feature = "GS1", node = "A", type = "gMat")