This function will generate an object of class, K2. This will run pre-processing functions for running K2 Taxonomer procedure.
Usage
K2preproc(
object,
cohorts = NULL,
eMatDS = NULL,
colData = NULL,
vehicle = NULL,
variables = NULL,
seuAssay = "RNA",
seuAssayDS = "RNA",
sceAssay = "logcounts",
sceAssayDS = NULL,
block = NULL,
logCounts = TRUE,
use = c("Z", "MEAN"),
nFeats = "sqrt",
featMetric = c("F", "mad", "sd", "Sn"),
DGEmethod = c("limma", "mast"),
DGEexpThreshold = 0.25,
recalcDataMatrix = TRUE,
nBoots = 500,
useCors = 1,
clustFunc = "cKmeansDownsampleSqrt",
clustList = NULL,
linkage = c("mcquitty", "ward.D", "ward.D2", "single", "complete", "average",
"centroid"),
info = NULL,
genesets = NULL,
qthresh = 0.05,
cthresh = 0,
ntotal = 20000,
ScoreGeneSetMethod = c("GSVA", "AUCELL"),
oneoff = TRUE,
stabThresh = 0,
geneURL = NULL,
genesetURL = NULL
)
Arguments
- object
An object of class matrix, dgCMatrix, Seurat, SingleCellExperiment, or ExpressionSet. For matrix and dgCMatrix, include genes and observations (single-cell/bulk profiles) as rows and columns, respectively.
- cohorts
Character. The column in meta data of 'object' that has cohort IDs. Default NULL if no cohorts in data.
- eMatDS
Numeric matrix. A matrix with the same number of observations as 'object' containing normalized expression data to be used in analyses downstream of partitioning results.
- colData
data.frame. Only used if 'object' is a matrix or dgCMatrix. A data frame with named rows and columns containing observation data for each column in 'object'.
- vehicle
The value in the cohort variable that contains the ID of observation to use as control. Default NULL if no vehicle to be used.
- variables
Character. Columns in meta data of 'object' to control for in differential analyses.
- seuAssay
Character. Name of assay in Seurat object containing expression data for running partitioning algorithm. If cohorts based on clustering, this should be the assay used.
- seuAssayDS
Character. Name of assay in Seurat object containing expression data normalized expression data to be used in analyses downstream of partitioning algorithm.
- sceAssay
Character. Name of assay in SingleCellExperimen object containing expression data for running partitioning algorithm. If cohorts based on clustering, this should be the assay used.
- sceAssayDS
Character. Name of assay in SingleCellExperiment object containing expression data normalized expression data to be used in analyses downstream of partitioning algorithm.
- block
Character. Block parameter in limma for modeling higherarchical data structure, such as multiple observations per individual.
- logCounts
Logical. Whether or not expression values are log-scale counts or log normalized counts from RNA-seq. Default is TRUE.
- use
Character. Options are 'Z' to generate test statistics or 'MEAN' to use means from differential analysis for clustering.
- nFeats
'sqrt' or a numeric value <= number of features to subset for each partition.
- featMetric
Character. Metric to use to assign gene-level variance/signal score.
F: F-statistic from evaluating differences in means across cohort
mad: Median absolute deviation
sd: Standard deviation
Sn: Robust scale estimator
- DGEmethod
Character. Method for running differential gene expression analyses. Use one of either 'limma' (default) or 'mast'.
- DGEexpThreshold
Numeric. A value between 0 and 1 indicating for filtering lowly expressed genes for partition-specific differential gene expression. Proportion of observations with counts > 0 in at least one subgroup at a specific partition.
- recalcDataMatrix
Logical. Recalculate dataMatrix for each partion? Default is TRUE.
- nBoots
nBoots A value of the number of bootstraps to run at each partition. Default is 500.
- useCors
Numeric. Number of cores to use for parallelizable processes.
- clustFunc
Character. Wrapper function to be used in recursive partitioning.
cKmeansDownsampleSqrt: Perform constrained K-means clustering after subsampling each cohort by the square root of the number of observations
cKmeansDownsampleSmallest: Perform constrained K-means clustering after subsampling each cohort by the size of the smallest cohort
hclustWrapper: Perform hierarchical clustering
- clustList
Optional named list of parameters to use with clustFunc.
cKmeansDownsampleSqrt:
maxIter: The maximum number of iterations to use with lcvqe()
cKmeansDownsampleSmallest:
maxIter: The maximum number of iterations to use with lcvqe()
hclustWrapper:
aggMethod: One of the hierarchichal methods specified by hclust() function
distMetric: One of the distance metrics specified by dist() function
- linkage
Character. Linkage criteria for splitting cosine matrix ('method' in hclust). 'average' by default.
- info
Character. A vector of column names in meta data of 'object' that contain information to be used in cohort annotation of dashboard visualization
- genesets
Named list. Feature sets to be includes in enrichment-based analyses.
- qthresh
Numeric. A value between 0 and 1 indicating the FDR cuttoff to define feature sets.
- cthresh
Numeric. A positive value for the coefficient cuttoff to define feature sets.
- ntotal
Numeric. A positive value to use as the background feature count. 20000 by default.
- ScoreGeneSetMethod
Character. Method for gene set scoring. Use one of either 'GSVA' (default) or 'AUCELL'.
- oneoff
Logical. Allow 1 observation partition groups? Default is TRUE.
- stabThresh
Numeric. A value between 0 and 1 indicatingThreshold for ending clustering.
- geneURL
Named list. URLs linking genes to external resources.
- genesetURL
Named list. URLs linking gene set to external resources.
References
Reed ER, Monti S (2021). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Nucleic Acids Research. doi:10.1093/nar/gkab552 , https://pubmed.ncbi.nlm.nih.gov/34226941/. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47--e47. ISSN 1362-4962, 0305-1048, doi:10.1093/nar/gkv007 , http://academic.oup.com/nar/article/43/7/e47/2414268/limma-powers-differential-expression-analyses-for. Rousseeuw PJ, Croux C (1993). “Alternatives to the Median Absolute Deviation.” Journal of the American Statistical Association, 88(424), 1273-1283. ISSN 0162-1459, doi:10.1080/01621459.1993.10476408 .