This function will generate an object of class, K2. This will run pre-processing functions for running K2 Taxonomer procedure.
K2preproc(
object,
cohorts = NULL,
eMatDS = NULL,
colData = NULL,
vehicle = NULL,
variables = NULL,
seuAssay = "RNA",
seuAssayDS = "RNA",
sceAssay = "logcounts",
sceAssayDS = NULL,
block = NULL,
logCounts = TRUE,
use = c("Z", "MEAN"),
nFeats = "sqrt",
featMetric = c("F", "mad", "sd", "Sn"),
DGEmethod = c("limma", "mast"),
DGEexpThreshold = 0.25,
recalcDataMatrix = TRUE,
nBoots = 500,
useCors = 1,
clustFunc = "cKmeansDownsampleSqrt",
clustList = NULL,
linkage = c("mcquitty", "ward.D", "ward.D2", "single", "complete", "average",
"centroid"),
info = NULL,
genesets = NULL,
qthresh = 0.05,
cthresh = 0,
ntotal = 20000,
ScoreGeneSetMethod = c("GSVA", "AUCELL"),
oneoff = TRUE,
stabThresh = 0,
geneURL = NULL,
genesetURL = NULL
)An object of class matrix, dgCMatrix, Seurat, SingleCellExperiment, or ExpressionSet. For matrix and dgCMatrix, include genes and observations (single-cell/bulk profiles) as rows and columns, respectively.
Character. The column in meta data of 'object' that has cohort IDs. Default NULL if no cohorts in data.
Numeric matrix. A matrix with the same number of observations as 'object' containing normalized expression data to be used in analyses downstream of partitioning results.
data.frame. Only used if 'object' is a matrix or dgCMatrix. A data frame with named rows and columns containing observation data for each column in 'object'.
The value in the cohort variable that contains the ID of observation to use as control. Default NULL if no vehicle to be used.
Character. Columns in meta data of 'object' to control for in differential analyses.
Character. Name of assay in Seurat object containing expression data for running partitioning algorithm. If cohorts based on clustering, this should be the assay used.
Character. Name of assay in Seurat object containing expression data normalized expression data to be used in analyses downstream of partitioning algorithm.
Character. Name of assay in SingleCellExperimen object containing expression data for running partitioning algorithm. If cohorts based on clustering, this should be the assay used.
Character. Name of assay in SingleCellExperiment object containing expression data normalized expression data to be used in analyses downstream of partitioning algorithm.
Character. Block parameter in limma for modeling higherarchical data structure, such as multiple observations per individual.
Logical. Whether or not expression values are log-scale counts or log normalized counts from RNA-seq. Default is TRUE.
Character. Options are 'Z' to generate test statistics or 'MEAN' to use means from differential analysis for clustering.
'sqrt' or a numeric value <= number of features to subset for each partition.
Character. Metric to use to assign gene-level variance/signal score.
F: F-statistic from evaluating differences in means across cohort
mad: Median absolute deviation
sd: Standard deviation
Sn: Robust scale estimator
Character. Method for running differential gene expression analyses. Use one of either 'limma' (default) or 'mast'.
Numeric. A value between 0 and 1 indicating for filtering lowly expressed genes for partition-specific differential gene expression. Proportion of observations with counts > 0 in at least one subgroup at a specific partition.
Logical. Recalculate dataMatrix for each partion? Default is TRUE.
nBoots A value of the number of bootstraps to run at each partition. Default is 500.
Numeric. Number of cores to use for parallelizable processes.
Character. Wrapper function to be used in recursive partitioning.
cKmeansDownsampleSqrt: Perform constrained K-means clustering after subsampling each cohort by the square root of the number of observations
cKmeansDownsampleSmallest: Perform constrained K-means clustering after subsampling each cohort by the size of the smallest cohort
hclustWrapper: Perform hierarchical clustering
Optional named list of parameters to use with clustFunc.
cKmeansDownsampleSqrt:
maxIter: The maximum number of iterations to use with lcvqe()
cKmeansDownsampleSmallest:
maxIter: The maximum number of iterations to use with lcvqe()
hclustWrapper:
aggMethod: One of the hierarchichal methods specified by hclust() function
distMetric: One of the distance metrics specified by dist() function
Character. Linkage criteria for splitting cosine matrix ('method' in hclust). 'average' by default.
Character. A vector of column names in meta data of 'object' that contain information to be used in cohort annotation of dashboard visualization
Named list. Feature sets to be includes in enrichment-based analyses.
Numeric. A value between 0 and 1 indicating the FDR cuttoff to define feature sets.
Numeric. A positive value for the coefficient cuttoff to define feature sets.
Numeric. A positive value to use as the background feature count. 20000 by default.
Character. Method for gene set scoring. Use one of either 'GSVA' (default) or 'AUCELL'.
Logical. Allow 1 observation partition groups? Default is TRUE.
Numeric. A value between 0 and 1 indicatingThreshold for ending clustering.
Named list. URLs linking genes to external resources.
Named list. URLs linking gene set to external resources.
An object of class, `K2`.
Reed ER, Monti S (2021). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Nucleic Acids Research. doi:10.1093/nar/gkab552 , https://pubmed.ncbi.nlm.nih.gov/34226941/. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47–e47. ISSN 1362-4962, 0305-1048, doi:10.1093/nar/gkv007 , http://academic.oup.com/nar/article/43/7/e47/2414268/limma-powers-differential-expression-analyses-for. Rousseeuw PJ, Croux C (1993). “Alternatives to the Median Absolute Deviation.” Journal of the American Statistical Association, 88(424), 1273-1283. ISSN 0162-1459, doi:10.1080/01621459.1993.10476408 .