R/cKmeansWrapperSubsample.R
cKmeansWrapperSubsample.Rd
This fuction is a wrapper for the constrained Kmeans algorithm using `lcvqe` from the `conclust` package. This function will subset each cohort down to that with the smallest number of observations.This function is not meant to be run individually, but as a 'clustFunc' argument for running `K2preproc()`, `runK2Taxonomer()`, and `K2tax()`.
cKmeansWrapperSubsample(dataMatrix, clustList)
dataMatrix | An P x C numeric matrix of data. Where C is the number of cohort labels. |
---|---|
clustList | List of objects to use for clustering procedure.
|
A character string of concatenated 1's and 2's pertaining to the cluster assignment of each column in dataMatrix.
Reed ER, Monti S (2020). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Bioinformatics. doi: 10.1101/2020.11.05.370197 , http://biorxiv.org/lookup/doi/10.1101/2020.11.05.370197. Wagstaff K, Cardie C, Rogers S, Schrodl S (2001). “Constrained K-means Clustering with Background Knowledge.” In ICML, 577--584.
#>#>#>#>#>#>#>eSet <- ExpressionSet(assayData=assay(dat)) pData(eSet) <- as.data.frame(colData(dat)) exprs(eSet) <- log2(exprs(eSet) + 1) ## Subset for fewer cluster labels for this example eSet <- eSet[, !is.na(eSet$Primary.Type) & eSet$Primary.Type %in% c('L4 Arf5', 'L4 Ctxn3', 'L4 Scnn1a', 'L5 Ucma', 'L5a Batf3')] ## Create cell type variable with spaces eSet$celltype <- gsub(' ', '_', eSet$Primary.Type) ## Create clustList cL <- list( eMat=exprs(eSet), labs=eSet$celltype, maxIter=10) ## Run K2preproc to generate generate data matrix ## with a column for each celltype. K2res <- K2preproc(eSet, cohorts='celltype', featMetric='F', logCounts=TRUE)#> Collapsing group-level values with LIMMA.#> [1] "11222"