R/runK2Taxonomer.R
runK2Taxonomer.Rd
This function will generate an object of class, K2. This will run the K2 Taxonomer procedure, differential analysis, and finally hyperenrichment on a named list of feature sets.
runK2Taxonomer( eSet, cohorts = NULL, vehicle = NULL, covariates = NULL, block = NULL, logCounts = FALSE, use = c("Z", "MEAN"), nFeats = "sqrt", featMetric = c("mad", "sd", "Sn", "Qn", "F", "square"), recalcDataMatrix = FALSE, nBoots = 500, clustFunc = hclustWrapper, clustCors = 1, clustList = list(), linkage = c("mcquitty", "ward.D", "ward.D2", "single", "complete", "average", "centroid"), info = NULL, infoClass = NULL, genesets = NULL, qthresh = 0.05, cthresh = 0, ntotal = 20000, ssGSEAalg = c("gsva", "ssgsea", "zscore", "plage"), ssGSEAcores = 1, oneoff = TRUE, stabThresh = 0, geneURL = NULL, genesetURL = NULL )
eSet | An expression set object. |
---|---|
cohorts | The column in phenotype data of eSet that has cohort ID's. Default NULL if no pre-processing of data. |
vehicle | The value in the cohort variable that contains the vehicle ID. Default NULL if no vehicle to be used. |
covariates | Covariates in phenotype data of eSet to control for in differential analysis. |
block | Block parameter in limma for modelling random-like effects. |
logCounts | Logical. Whether or not expression values are log-scale counts or log normalized counts from RNA-seq. Default is FALSE. |
use | Character string. Options are 'Z' to generate test statistics or 'MEAN' to use means from differential analysis for clustering. |
nFeats | 'sqrt' or a numeric value <= number of features to subset the features for each partition. |
featMetric | Metric to use to assign variance/signal score. Options are: 'mad' (default), 'mad', 'Sn', 'Qn', 'F', and 'square'. |
recalcDataMatrix | Logical. Recalculate dataMatrix for each partion? Default is FALSE. |
nBoots | A numeric value of the number of bootstraps to run at each split. |
clustFunc | Wrapper function to cluster a P x N (See details). |
clustCors | Number of cores to use for clustering. |
clustList | List of objects to use for clustering procedure. |
linkage | Linkage criteria for splitting cosine matrix ('method' in hclust). 'average' by default. |
info | A data frame with rownames that match column names in dataMatrix. |
infoClass | A named vector denoted types of tests to run on metavariables. |
genesets | A named list of features in row names of dataMatrix. |
qthresh | A numeric value between 0 and 1 of the FDR cuttoff to define feature sets. |
cthresh | A positive value for the coefficient cuttoff to define feature sets. |
ntotal | A positive value to use as the background feature count. 20000 by default. |
ssGSEAalg | A character string, specifying which algorithm to use for running the gsva() function from the GSVA package. Options are 'gsva', 'ssgsea', 'zscore', and 'plage'. 'gsva' by default. |
ssGSEAcores | Number of cores to use for running gsva() from the GSVA package. Default is 1. |
oneoff | Logical. Allow 1 member clusters? |
stabThresh | Threshold for ending clustering. |
geneURL | Optional. Named list of URLs to gene information. |
genesetURL | Optional. Named list of URLs to geneset information. |
An object of class, `K2`.
Reed ER, Monti S (2020). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Bioinformatics. doi: 10.1101/2020.11.05.370197 , http://biorxiv.org/lookup/doi/10.1101/2020.11.05.370197.
## Read in ExpressionSet object library(Biobase) data(sample.ExpressionSet) ## Create dummy set of gene sets genes <- rownames(sample.ExpressionSet) genesetsMadeUp <- list( GS1=genes[1:50], GS2=genes[51:100], GS3=genes[101:150]) ## Run K2 Taxonomer wrapper K2Res <- runK2Taxonomer(sample.ExpressionSet, genesets=genesetsMadeUp, qthresh=0.1, ssGSEAalg='gsva', ssGSEAcores=1, stabThresh=0.5)#> Running K2Taxonomer bootstraps. #> Running differential analysis on genes. #> Running hyperenrichment analysis on genes with FDR<0.1 & |coefficient|>0. #> Running ssGSEA. #> Estimating GSVA scores for 3 gene sets. #> Estimating ECDFs with Gaussian kernels #> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100% #> #> Running differential analysis on gene sets.