This function will generate an object of class, K2. This will run pre-processing functions for running K2 Taxonomer procedure.

K2preproc(
    eSet,
    cohorts = NULL,
    vehicle = NULL,
    covariates = NULL,
    block = NULL,
    logCounts = FALSE,
    use = c("Z", "MEAN"),
    nFeats = "sqrt",
    featMetric = c("mad", "sd", "Sn", "Qn", "F", "square"),
    recalcDataMatrix = FALSE,
    nBoots = 500,
    clustFunc = hclustWrapper,
    clustCors = 1,
    clustList = list(),
    linkage = c("mcquitty", "ward.D", "ward.D2", "single", "complete",
        "average", "centroid"),
    info = NULL,
    infoClass = NULL,
    genesets = NULL,
    qthresh = 0.05,
    cthresh = 0,
    ntotal = 20000,
    ssGSEAalg = c("gsva", "ssgsea", "zscore", "plage"),
    ssGSEAcores = 1,
    oneoff = TRUE,
    stabThresh = 0,
    geneURL = NULL,
    genesetURL = NULL
)

Arguments

eSet

An expression set object.

cohorts

The column in phenotype data of eSet that has cohort ID's. Default NULL if no pre-processing of data.

vehicle

The value in the cohort variable that contains the vehicle ID. Default NULL if no vehicle to be used.

covariates

Covariates in phenotype data of eSet to control for in differential analysis.

block

Block parameter in limma for modelling random-like effects.

logCounts

Logical. Whether or not expression values are log-scale counts or log normalized counts from RNA-seq. Default is FALSE.

use

Character string. Options are 'Z' to generate test statistics or 'MEAN' to use means from differential analysis for clustering.

nFeats

'sqrt' or a numeric value <= number of features to subset the features for each partition.

featMetric

Metric to use to assign variance/signal score. Options are 'square' (default) use square values and 'mad' to use MAD scores.

recalcDataMatrix

Logical. Recalculate dataMatrix for each partion? Default is FALSE.

nBoots

A numeric value of the number of bootstraps to run at each split.

clustFunc

Wrapper function to cluster a P x N (See details).

clustCors

Number of cores to use for clustering.

clustList

List of objects to use for clustering procedure.

linkage

Linkage criteria for splitting cosine matrix ('method' in hclust). 'average' by default.

info

A data frame with rownames that match column names in dataMatrix.

infoClass

A named vector denoted types of tests to run on metavariables.

genesets

A named list of features in row names of dataMatrix.

qthresh

A numeric value between 0 and 1 of the FDR cuttoff to define feature sets.

cthresh

A positive value for the coefficient cuttoff to define feature sets.

ntotal

A positive value to use as the background feature count. 20000 by default.

ssGSEAalg

A character string, specifying which algorithm to use for running the gsva() function from the GSVA package. Options are 'gsva', 'ssgsea', 'zscore', and 'plage'. 'gsva' by default.

ssGSEAcores

Number of cores to use for running gsva() from the GSVA package. Default is 1.

oneoff

Logical. Allow 1 member clusters?

stabThresh

Threshold for ending clustering.

geneURL

Optional. Named list of URLs to gene information.

genesetURL

Optional. Named list of URLs to geneset information.

Value

An object of class, `K2`.

References

Reed ER, Monti S (2020). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Bioinformatics. doi: 10.1101/2020.11.05.370197 , http://biorxiv.org/lookup/doi/10.1101/2020.11.05.370197. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47--e47. ISSN 1362-4962, 0305-1048, doi: 10.1093/nar/gkv007 , http://academic.oup.com/nar/article/43/7/e47/2414268/limma-powers-differential-expression-analyses-for.

Examples

## Read in ExpressionSet object library(Biobase) data(sample.ExpressionSet) ## Pre-process and create K2 object K2res <- K2preproc(sample.ExpressionSet)