Wrapper function to run K2Taxonomer algorithm and annotation

This function will generate an object of class, K2. This will run the K2 Taxonomer procedure, differential analysis, and finally hyperenrichment on a named list of feature sets.

runK2Taxonomer(
    eSet,
    cohorts = NULL,
    vehicle = NULL,
    covariates = NULL,
    block = NULL,
    logCounts = FALSE,
    use = c("Z", "MEAN"),
    nFeats = "sqrt",
    featMetric = c("mad", "sd", "Sn", "Qn", "F", "square"),
    recalcDataMatrix = FALSE,
    nBoots = 500,
    clustFunc = hclustWrapper,
    clustCors = 1,
    clustList = list(),
    linkage = c("mcquitty", "ward.D", "ward.D2", "single", "complete",
        "average", "centroid"),
    info = NULL,
    infoClass = NULL,
    genesets = NULL,
    qthresh = 0.05,
    cthresh = 0,
    ntotal = 20000,
    ssGSEAalg = c("gsva", "ssgsea", "zscore", "plage"),
    ssGSEAcores = 1,
    oneoff = TRUE,
    stabThresh = 0,
    geneURL = NULL,
    genesetURL = NULL
)

Arguments

eSet	An expression set object.
cohorts	The column in phenotype data of eSet that has cohort ID's. Default NULL if no pre-processing of data.
vehicle	The value in the cohort variable that contains the vehicle ID. Default NULL if no vehicle to be used.
covariates	Covariates in phenotype data of eSet to control for in differential analysis.
block	Block parameter in limma for modelling random-like effects.
logCounts	Logical. Whether or not expression values are log-scale counts or log normalized counts from RNA-seq. Default is FALSE.
use	Character string. Options are 'Z' to generate test statistics or 'MEAN' to use means from differential analysis for clustering.
nFeats	'sqrt' or a numeric value <= number of features to subset the features for each partition.
featMetric	Metric to use to assign variance/signal score. Options are: 'mad' (default), 'mad', 'Sn', 'Qn', 'F', and 'square'.
recalcDataMatrix	Logical. Recalculate dataMatrix for each partion? Default is FALSE.
nBoots	A numeric value of the number of bootstraps to run at each split.
clustFunc	Wrapper function to cluster a P x N (See details).
clustCors	Number of cores to use for clustering.
clustList	List of objects to use for clustering procedure.
linkage	Linkage criteria for splitting cosine matrix ('method' in hclust). 'average' by default.
info	A data frame with rownames that match column names in dataMatrix.
infoClass	A named vector denoted types of tests to run on metavariables.
genesets	A named list of features in row names of dataMatrix.
qthresh	A numeric value between 0 and 1 of the FDR cuttoff to define feature sets.
cthresh	A positive value for the coefficient cuttoff to define feature sets.
ntotal	A positive value to use as the background feature count. 20000 by default.
ssGSEAalg	A character string, specifying which algorithm to use for running the gsva() function from the GSVA package. Options are 'gsva', 'ssgsea', 'zscore', and 'plage'. 'gsva' by default.
ssGSEAcores	Number of cores to use for running gsva() from the GSVA package. Default is 1.
oneoff	Logical. Allow 1 member clusters?
stabThresh	Threshold for ending clustering.
geneURL	Optional. Named list of URLs to gene information.
genesetURL	Optional. Named list of URLs to geneset information.

Value

An object of class, `K2`.

References

Reed ER, Monti S (2020). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Bioinformatics. doi: 10.1101/2020.11.05.370197 , http://biorxiv.org/lookup/doi/10.1101/2020.11.05.370197.

Examples

## Read in ExpressionSet object
library(Biobase)
data(sample.ExpressionSet)

## Create dummy set of gene sets
genes <- rownames(sample.ExpressionSet)
genesetsMadeUp <- list(
    GS1=genes[1:50],
    GS2=genes[51:100],
    GS3=genes[101:150])

## Run K2 Taxonomer wrapper
K2Res <- runK2Taxonomer(sample.ExpressionSet,
                    genesets=genesetsMadeUp,
                    qthresh=0.1,
                    ssGSEAalg='gsva',
                    ssGSEAcores=1,
                    stabThresh=0.5)
#> Running K2Taxonomer bootstraps.
#> Running differential analysis on genes.
#> Running hyperenrichment analysis on genes with FDR<0.1 & |coefficient|>0.
#> Running ssGSEA.
#> Estimating GSVA scores for 3 gene sets.
#> Estimating ECDFs with Gaussian kernels
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
#> 
#> Running differential analysis on gene sets.