This function will generate an object of class, K2. This will run pre-processing functions for running K2 Taxonomer procedure.
K2preproc( eSet, cohorts = NULL, vehicle = NULL, covariates = NULL, block = NULL, logCounts = FALSE, use = c("Z", "MEAN"), nFeats = "sqrt", featMetric = c("mad", "sd", "Sn", "Qn", "F", "square"), recalcDataMatrix = FALSE, nBoots = 500, clustFunc = hclustWrapper, clustCors = 1, clustList = list(), linkage = c("mcquitty", "ward.D", "ward.D2", "single", "complete", "average", "centroid"), info = NULL, infoClass = NULL, genesets = NULL, qthresh = 0.05, cthresh = 0, ntotal = 20000, ssGSEAalg = c("gsva", "ssgsea", "zscore", "plage"), ssGSEAcores = 1, oneoff = TRUE, stabThresh = 0, geneURL = NULL, genesetURL = NULL )
eSet | An expression set object. |
---|---|
cohorts | The column in phenotype data of eSet that has cohort ID's. Default NULL if no pre-processing of data. |
vehicle | The value in the cohort variable that contains the vehicle ID. Default NULL if no vehicle to be used. |
covariates | Covariates in phenotype data of eSet to control for in differential analysis. |
block | Block parameter in limma for modelling random-like effects. |
logCounts | Logical. Whether or not expression values are log-scale counts or log normalized counts from RNA-seq. Default is FALSE. |
use | Character string. Options are 'Z' to generate test statistics or 'MEAN' to use means from differential analysis for clustering. |
nFeats | 'sqrt' or a numeric value <= number of features to subset the features for each partition. |
featMetric | Metric to use to assign variance/signal score. Options are 'square' (default) use square values and 'mad' to use MAD scores. |
recalcDataMatrix | Logical. Recalculate dataMatrix for each partion? Default is FALSE. |
nBoots | A numeric value of the number of bootstraps to run at each split. |
clustFunc | Wrapper function to cluster a P x N (See details). |
clustCors | Number of cores to use for clustering. |
clustList | List of objects to use for clustering procedure. |
linkage | Linkage criteria for splitting cosine matrix ('method' in hclust). 'average' by default. |
info | A data frame with rownames that match column names in dataMatrix. |
infoClass | A named vector denoted types of tests to run on metavariables. |
genesets | A named list of features in row names of dataMatrix. |
qthresh | A numeric value between 0 and 1 of the FDR cuttoff to define feature sets. |
cthresh | A positive value for the coefficient cuttoff to define feature sets. |
ntotal | A positive value to use as the background feature count. 20000 by default. |
ssGSEAalg | A character string, specifying which algorithm to use for running the gsva() function from the GSVA package. Options are 'gsva', 'ssgsea', 'zscore', and 'plage'. 'gsva' by default. |
ssGSEAcores | Number of cores to use for running gsva() from the GSVA package. Default is 1. |
oneoff | Logical. Allow 1 member clusters? |
stabThresh | Threshold for ending clustering. |
geneURL | Optional. Named list of URLs to gene information. |
genesetURL | Optional. Named list of URLs to geneset information. |
An object of class, `K2`.
Reed ER, Monti S (2020). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Bioinformatics. doi: 10.1101/2020.11.05.370197 , http://biorxiv.org/lookup/doi/10.1101/2020.11.05.370197. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47--e47. ISSN 1362-4962, 0305-1048, doi: 10.1093/nar/gkv007 , http://academic.oup.com/nar/article/43/7/e47/2414268/limma-powers-differential-expression-analyses-for.