Skip to contents

Introduction

This vignette describes the workflow for running K2Taxonomer recursive partitioning on single-cell gene expression data (Reed and Monti 2021). Note, that many of these steps are shared with that of bulk expression analyses. A vignette for running K2Taxonomer on bulk expression data can be found here.

The single-cell expression workflow performs partitioning at the level of annotated cell clusters and/or cell types. Note, for recursive partitioning it is recommended to use the same data matrix from which clustering tasks were performed, such as an integrated data matrix generated prior to and used for clustering. For tasks downstream of partitioning, users can specify an alternative data matrix.

Requirements

Load packages

## K2Taxonomer package
library(K2Taxonomer)

## Seurat package
library(Seurat)

## For drawing dendrograms
library(ggdendro)

Read in single-cell RNAseq data

data("ifnb_small")

Read in gene sets for subgroup annotation

data("cellMarker2_genesets")

The IFNB Data

DimPlot(ifnb_small, label = TRUE, raster =TRUE,  pt.size = 3, alpha = 0.4) + NoLegend()

Running K2Taxonomer

Get necessary data objects

## Integrated expression matrix used for clustering data
integrated_expression_matrix <- ifnb_small@assays$integrated$scale.data

## Normalized expression matrix to be used for downstream analyses
normalized_expression_matrix <- ifnb_small@assays$SCT$data

## Profile-level information
cell_data <- ifnb_small@meta.data

Initialize K2 object

The K2preproc() initializes the K2 object and runs pre-processing steps. Here, you can specify all arguments used throughout the analysis. Otherwise, you can specify these arguments within the specific functions for which they are implemented. See help pages for more information.

A description of arguments implemented in this vignette are

  • object: “Expression Matrix”: Expression matrix to be used for partioning.
  • eMatDS: “Expression Matrix”: Optional: Expression matrix to be used for downstream analysis. This is useful if the matrix in object is comprised of data that has been integrated across experiments/samples because these are generally considered insufficent for statistical analyses of singe-cell gene expression.
  • colData: “Data Frame”: A data frame which contains a row for each profile (i.e. columns in object).
  • cohorts: “String”: A column name in colData. This column specifies the cluster/cell type annotations for each profile.
  • nBoots: “Integer”: The number of bootstraps to run at each partition.
  • clustFunc: “String” or “Function”: The partitioning function to use. If a string, the function will be selected from built-in functions.
  • genesets: “Named List”: A named list of gene sets to be used for downstream annotation.
# Initialize `K2` object
K2res <- K2preproc(object = integrated_expression_matrix,
                   eMatDS = normalized_expression_matrix,
                   colData = cell_data,
                   cohorts="cell_type",
                   nBoots = 200,
                   clustFunc = "cKmeansDownsampleSqrt",
                   genesets = cellMarker2_genesets)

Run recursive partitioning algorithm

The K2Taxonomer is run by K2tax(). At each recursion of the algorithm, the observations are partitioned into two sub-groups based on a compilation of repeated K=2 clustering on bootstrapped sampling of features.

K2res <- K2tax(K2res)

Generate dendrogram from K2Taxonomer results

Static dendrogram
## Get dendrogram from K2Taxonomer
dendro <- K2dendro(K2res)

## Plot dendrogram
ggdendrogram(dendro)

Interactive dendrogram

Annotate K2Taxonomer results

Partition-level Differential Gene Expression Analysis

K2res <- runDGEmods(K2res)
#> Running DGE for partition:
#>   1 / 15 
#>   2 / 15 
#>   3 / 15 
#>   4 / 15 
#>   5 / 15 
#>   6 / 15 
#>   7 / 15 
#>   8 / 15 
#>   9 / 15 
#>   10 / 15 
#>   11 / 15 
#>   12 / 15 
#>   13 / 15 
#>   14 / 15 
#>   15 / 15

Perform gene set based analyses

### Perform Fisher Exact Test based over-representation analysis
K2res <- runFISHERmods(K2res)

### Perform single-sample gene set scoring
K2res <- runScoreGeneSets(K2res)
#> Scoring gene sets with GSVA.
#> Estimating GSVA scores for 26 gene sets.
#> Estimating ECDFs with Gaussian kernels

### Perform partition-level differential gene set score analysis
K2res <- runDSSEmods(K2res)
#> Running different enrichment score for partition:
#>   1 / 15 
#>   2 / 15 
#>   3 / 15 
#>   4 / 15 
#>   5 / 15 
#>   6 / 15 
#>   7 / 15 
#>   8 / 15 
#>   9 / 15 
#>   10 / 15 
#>   11 / 15 
#>   12 / 15 
#>   13 / 15 
#>   14 / 15 
#>   15 / 15

Explore Annotation Results

Partition-level Differential Gene Expression Analysis

Create Static Table of DGE Results

DGEtable <- getDGETable(K2res)
head(DGEtable)
#>       gene     coef      mean        t       df pval fdr edge node direction
#> 1      FTL 2.779654 2.6534606 58.00984 1594.629    0   0    2    A        up
#> 2    TIMP1 2.394971 1.0806257 66.56321 1610.346    0   0    2    A        up
#> 3  HLA-DRA 2.230605 0.5719218 62.69911 1117.113    0   0    1    D        up
#> 4 C15orf48 2.015372 0.8097896 71.48105 1711.519    0   0    2    A        up
#> 5     SOD2 1.901742 0.8700197 61.28335 1711.519    0   0    2    A        up
#> 6   TYROBP 1.808682 0.8001095 86.60850 1711.519    0   0    2    A        up

Create Interactive Table of DGE Results

getDGEInter(K2res, minDiff = 1, node = c("A"), pagelength = 10)

Create Plot of Gene Expression

plotGenePathway(K2res, feature = "FTL", node = "A")

Create Static Plot Gene Expression

plotGenePathway(K2res, feature = "FTL", node = "A", use_plotly = FALSE)

Partition-level Gene Set Enrichment Results

Create Static Table of Gene Set Results

ENRtable <- getEnrichmentTable(K2res)
head(ENRtable)
#>             category node edge direction  pval_fisher   fdr_fisher nhits ndrawn
#> 1           Monocyte    A    2        up 7.911659e-21 5.957479e-18    19    409
#> 2         Macrophage    A    2        up 7.490187e-05 5.108308e-02     4    409
#> 3      Memory T cell    A    2        up 1.907696e-09 1.411695e-06     6    409
#> 4  Naive T(Th0) cell    A    1        up 6.009134e-09 4.410705e-06     6    252
#> 5      Megakaryocyte    K    2        up 8.468647e-08 6.156706e-05     3     24
#> 6 Classical monocyte    A    2        up 1.103540e-25 8.320692e-23    20    409
#>   ncats  ntot    pval_limma     fdr_limma      coef        mean        t
#> 1    45 20000  0.000000e+00  0.000000e+00 0.5628148 -0.24219423 87.14822
#> 2    12 20000 4.940656e-323 1.921915e-320 0.5496507 -0.25818063 49.24827
#> 3     8 20000 6.474983e-300 2.512294e-297 0.7812982 -0.16843514 46.54556
#> 4    13 20000 1.647213e-208 6.374713e-206 0.4924973 -0.20724307 36.00392
#> 5     8 20000 1.629249e-198 6.288902e-196 0.5863763  0.08476728 45.84733
#> 6    34 20000 4.399699e-181 1.693884e-178 0.4686035 -0.18868112 32.81403
#>                                                                                                                                         hits
#> 1                                 C5AR1,CD14,CD38,CD68,CD86,CREG1,CST3,FCGR3A,FGL2,LYZ,MNDA,MS4A7,OAZ1,PSAP,S100A6,S100A8,S100A9,SOD2,TYROBP
#> 2                                                                                                                        CD14,CD68,LYZ,MS4A7
#> 3                                                                                                 LGALS1,LGALS3,MYL12A,S100A10,S100A4,S100A6
#> 4                                                                                                             CCR7,CD3D,HSPD1,NPM1,PTMA,SELL
#> 5                                                                                                                             PF4,PPBP,TUBB1
#> 6 CD14,CD74,CDKN1A,CYP1B1,EPSTI1,FCGR3A,FCN1,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DRA,HLA-DRB1,IFIT2,IFIT3,IFITM3,ISG15,LYZ,S100A8,S100A9

Create Interactive Table of Gene Set Results

getEnrichmentInter(K2res, nodes = c("A"), pagelength = 10)

Create Interactive Plot of Single-sample Scoring

plotGenePathway(K2res, feature = "Monocyte", node = "A", type = "gMat")

Create Static Plot of Single-sample Scoring

plotGenePathway(K2res, feature = "Monocyte", node = "A", type = "gMat", use_plotly = FALSE)

Create Dashboard

For more information about K2Taxonomer dashboards, read this vignette.

# Not run
K2dashboard(K2res, "K2results_ifnb_small")

Implementation Options

Parellel excecutation

Given their size, parts of the K2Taxonomer workflow can take a long time with single-cell data sets. Accordingly, it is generally recommended to run the workflow using parallel computing. This can be implemented easily by setting the useCors argument in K2preproc()

# Not run
K2res <- K2preproc(object = integrated_expression_matrix,
                   eMatDS = normalized_expression_matrix,
                   colData = cell_data,
                   cohorts="cell_type",
                   nBoots = 200,
                   useCors = 8, ## Runs K2Taxonomer in parellel with eight cores.
                   clustFunc = "cKmeansDownsampleSqrt",
                   genesets = cellMarker2_genesets)

Using Seurat object directly

In addition to expression matrices, Seurat objects may be input directly with the object argument. When implemented, colData isn’t specified and this information is pulled from the meta data of the Seurat object. Two additional arguments may be set, each specifying the assay of the Seurat objects to pull expression data from.

  • seuAssay: “String”: Name of Seurat assay to pull expression data for recursive partitioning. The ‘scale.data’ slot will be pulled from this assay.
  • seuAssayDS: “String”: Name of Seurat assay to pull expression data for downstream analysis. The ‘data’ slot will be pulled from this assay.
K2res_seurat <- K2preproc(object = ifnb_small,
                   cohorts="cell_type",
                   seuAssay = "integrated",
                   seuAssayDS = "SCT",
                   nBoots = 200,
                   clustFunc = "cKmeansDownsampleSqrt",
                   genesets = cellMarker2_genesets)

## Run recursive partitioning algorithm
K2res_seurat <- K2tax(K2res_seurat)

## Get dendrogram from K2Taxonomer
dendro_seurat <- K2dendro(K2res_seurat)

## Plot dendrogram
ggdendrogram(dendro_seurat)

References

Reed, Eric R., and Stefano Monti. 2021. “Multi-Resolution Characterization of Molecular Taxonomies in Bulk and Single-Cell Transcriptomics Data.” Nucleic Acids Research. https://doi.org/10.1093/nar/gkab552.