This fuction is a wrapper for the constrained Kmeans algorithm using `lcvqe` from the `conclust` package. This function will subset each cohort down to that with the smallest number of observations.This function is not meant to be run individually, but as a 'clustFunc' argument for running `K2preproc()`, `runK2Taxonomer()`, and `K2tax()`.

cKmeansWrapperSubsample(dataMatrix, clustList)

Arguments

dataMatrix

An P x C numeric matrix of data. Where C is the number of cohort labels.

clustList

List of objects to use for clustering procedure.

  • 'eMat'P x N Expression matrix of data set. See `?Biobase::exprs()`.

  • 'labs'Vector of cohort labels of observations in data set, corresponding to columns of `clustList$eMat`.

Value

A character string of concatenated 1's and 2's pertaining to the cluster assignment of each column in dataMatrix.

References

Reed ER, Monti S (2020). “Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data.” Bioinformatics. doi: 10.1101/2020.11.05.370197 , http://biorxiv.org/lookup/doi/10.1101/2020.11.05.370197. Wagstaff K, Cardie C, Rogers S, Schrodl S (2001). “Constrained K-means Clustering with Background Knowledge.” In ICML, 577--584.

Examples

dat <- scRNAseq::ReprocessedAllenData(assays='rsem_tpm')[seq_len(50),]
#> snapshotDate(): 2020-10-27
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
eSet <- ExpressionSet(assayData=assay(dat)) pData(eSet) <- as.data.frame(colData(dat)) exprs(eSet) <- log2(exprs(eSet) + 1) ## Subset for fewer cluster labels for this example eSet <- eSet[, !is.na(eSet$Primary.Type) & eSet$Primary.Type %in% c('L4 Arf5', 'L4 Ctxn3', 'L4 Scnn1a', 'L5 Ucma', 'L5a Batf3')] ## Create cell type variable with spaces eSet$celltype <- gsub(' ', '_', eSet$Primary.Type) ## Create clustList cL <- list( eMat=exprs(eSet), labs=eSet$celltype, maxIter=10) ## Run K2preproc to generate generate data matrix ## with a column for each celltype. K2res <- K2preproc(eSet, cohorts='celltype', featMetric='F', logCounts=TRUE)
#> Collapsing group-level values with LIMMA.
dm <- K2data(K2res) ## Generate K=2 split with constrained K-means cKmeansWrapperSubsample(dm, cL)
#> [1] "11222"