Create OmicSignature

devtools::load_all(".")
library(dplyr)

1. Cheatsheet

An OmicSignature object contains three parts:

metadata, a list.
Required fields:
“signature_name”, “organism”, “direction_type”, “assay_type”, “phenotype”.
Recommended optional fields, if applicable:
“platform”, “sample_type”, “description”, “covariates”, “score_cutoff”, “adj_p_cutoff”.
signature, a data frame.
Required columns:
“probe_id” (unique identifier for each feature)
“feature_name” (e.g. ENSEMBL ID, Uniprot ID)
Required for bi-directional and categorical signatures: “direction”
Recommended optional column, if applicable: “score”
difexp (optional), a data frame of differential expression analysis results.
Required columns:
“probe_id” (unique identifier for each feature)
“feature_name” (e.g. ENSEMBL ID, Uniprot ID)
“score”
At least one of the following: “p_value”, “q_value”, or “adj_p”.
Recommended optional column, if applicable: “gene_symbol”

Create the object:

OmS <- OmicSignature$new(
  metadata = metadata,
  signature = signature,
  difexp = difexp
)

2. Create an `OmicSignature` Object Step-by-Step

The example provided below is from an experiment evaluating the effects of Myc reduced expression by comparing liver profiles of 24-month old Myc $^{+/+}$ vs. Myc $^{+/-}$ mice. This is a bi-directional signature example, since it contains up and down regulated features (genes). For ease of exposition, only the top 1000 genes are here included.

2.1. Metadata

A list with the following required fields:
“signature_name”, “organism”, “direction_type”, “assay_type”, “phenotype”.
Not required, but highly recommended fields:
“platform”, “sample_type”, “description”, “covariates”, “score_cutoff”, “adj_p_cutoff”, “logfc_cutoff”.

Option 1: Create metadata by hand. This is not recommended, because typos can occur.

metadata <- list(
  "signature_name" = Myc_reduce_mice_liver_24m,
  "organism" = "Mus musculus",
  "sample_type" = "liver",
  "phenotype" = "Myc_reduce",
  "direction_type" = "bi-directional",
  "assay_type" = "transcriptomics", 
  "platform" = "transcriptomics by array"
)

Option 2: Use createMetadata() (recommended).
This function helps remind you of the built-in attributes. The full list of current built-in attributes is shown here.
You can also provide your own customized attributes using the “others” field.

metadata <- createMetadata(
  # required attributes:
  signature_name = "Myc_reduce_mice_liver_24m",
  organism = "Mus musculus",
  direction_type = "bi-directional",
  assay_type = "transcriptomics",
  phenotype = "Myc_reduce",

  # optional and recommended:
  covariates = "none",
  description = "mice Myc haploinsufficient (Myc(+/-))",
  platform = "transcriptomics by array",
  sample_type = "liver", # use BRENDA ontology

  # optional cut-off attributes.
  # specifying them can facilitate the extraction of signatures.
  logfc_cutoff = NULL,
  p_value_cutoff = NULL,
  adj_p_cutoff = 0.05,
  score_cutoff = 5,

  # other optional built-in attributes:
  keywords = c("Myc", "KO", "longevity"),
  cutoff_description = NULL,
  author = NULL,
  PMID = 25619689,
  year = 2015,

  # example of customized attributes:
  others = list("animal_strain" = "C57BL/6")
)

2.1.1 “phenotype”

“phenotype” is free text, one or two-word description of the experimental conditions, such as a drug treatment, a gene knock out, or a clinical trait. For example, age or breast cancer.
Providing a detailed (free text) “description” is highly recommended. For instance, it may include information about how the treatment was administered and how each group was defined.

2.1.2 “organism

“organism” is free-text. We provide a list of common organisms, which you can search using searchOrganism. Other entries are allowed, but please use standard naming conventions (e.g., “Homo sapiens”, “Mus musculus”) to ensure consistency.

searchOrganism("homo")
#> [1] "Homo sapiens"

2.1.3 “sample_type” and “platform”

While both “sample_type” and “platform” are free text, it is recommended that predefined terms are used. Thus, “sample_type” should be a BRENDA ontology term, and “platform” should be one of a set of predefined platforms, if appropriate. You can search for predefined terms via the searchSampleType and searchPlatform functions. See “Sample Type & Platform Info” for details.

2.1.4 “direction_type”

direction_type must be one of the following:

“uni-directional”. Only a list of significant feature names is available. Examples including “genes mutated in a disease” and “markers of a cell type”.
“bi-directional”. Significant features can be grouped into “up” and “down” categories. For example, when comparing treatment vs. control groups, some features will be higher (“up”, or “+”) and some will be lower (“down” or “-”) in the treatment group. When the phenotype is a continuous trait, such as age, some features will increase (“up”, or “+”) with age, while others will decrease (“down”, or “-”) with age.
“categorical”. Used with multi-valued categorical phenotypes (e.g., “A” vs. “B” vs. “C”), usually analyzed by ANOVA.

2.1.5 “assay_type”

assay_type is one of the following:
- “transcriptomics” (e.g. RNA-seq, micro-array)
- “proteomics”
- “metabolomics”
- “methylomics”
- “genetic_variations” (e.g. SNP, GWAS)
- “DNA_binding_sites” (e.g. ChIP-seq)
- “other”

2.2. Signature

signature is a dataframe with columns “probe_id” and “feature_name”. If the signature is “bi-directional” or “categorical” (specified in direction_type in metadata), then column “direction” is also required.
An optional column “score” is recommended when applicable.

Option 1: Extract signature from a differential analysis results table.
In this example, we extract a bi-directional signature from a difexp object (see next section) using the score_cutoff and adj_p_cutoff specified in the metadata. A difexp object is simply a properly formatted data frame reporting the results of a differential analysis (e.g., by Limma). It looks something like this:

#>   probe_id  logfc  mean  score p_value adj_p      b       feature_name
#> 1 10345228 -0.167 7.106 -1.470   0.186 0.560 -5.866 ENSMUSG00000103746
#> 2 10354534  0.041 4.351  0.520   0.620 0.870 -6.780 ENSMUSG00000060715
#> 3 10354529 -0.175 4.955 -0.941   0.379 0.731 -6.458 ENSMUSG00000043629
#> 4 10346337  0.025 8.621  0.188   0.857 0.962 -6.911 ENSMUSG00000038323
#> 5 10353792 -0.025 6.063 -0.284   0.785 0.936 -6.885 ENSMUSG00000045815
#> 6 10350848 -0.055 7.595 -0.630   0.549 0.836 -6.712 ENSMUSG00000049881

We can then extract the significant features as follows:

signature <- difexp %>%
  dplyr::filter(abs(score) > metadata$score_cutoff & adj_p < metadata$adj_p_cutoff) %>%
  dplyr::select(probe_id, feature_name, score) %>%
  dplyr::mutate(direction = ifelse(score > 0, "+", "-"))
head(signature)
#>   probe_id       feature_name   score direction
#> 1 10346882 ENSMUSG00000025964  -6.990         -
#> 2 10353878 ENSMUSG00000067653  -7.867         -
#> 3 10349648 ENSMUSG00000004552  14.762         +
#> 4 10355278 ENSMUSG00000062209   6.083         +
#> 5 10353192 ENSMUSG00000025932  10.487         +
#> 6 10345762 ENSMUSG00000026072 -13.543         -

Function standardizeSigDF() can help remove duplicated and empty names.

signature <- standardizeSigDF(signature)
head(signature)
#>   probe_id       feature_name   score direction
#> 1 10349648 ENSMUSG00000004552  14.762         +
#> 2 10345762 ENSMUSG00000026072 -13.543         -
#> 3 10353192 ENSMUSG00000025932  10.487         +
#> 4 10355259 ENSMUSG00000061816 -10.315         -
#> 5 10351477 ENSMUSG00000102418   8.818         +
#> 6 10353878 ENSMUSG00000067653  -7.867         -

Option 2: Manually create signature dataframe.
For uni-directional signature:

signature <- data.frame("probe_id" = c(1, 2, 3), "feature_name" = c("gene1", "gene2", "gene3"))

For bi-directional signature:

signature <- data.frame(
  "probe_id" = c(1, 2, 3),
  "feature_name" = c("gene1", "gene2", "gene3"),
  "score" = c(0.45, -3.21, 2.44),
  "direction" = c("+", "-", "+")
)

For categorical signature:

signature <- data.frame(
  "probe_id" = c(1, 2, 3, 4),
  "feature_name" = c("gene1", "gene2", "gene3", "gene4"),
  "score" = c(0.45, -3.21, 2.44, -2.45),
  "direction" = c("group1", "group1", "group2", "group3")
)

2.3. Differential expression analysis results (difexp)

A differential expression dataframe is optional but recommended if available. It facilitates downstream signature extraction, and signature comparison by rank-based test.

difexp is a dataframe with the following required columns:
“probe_id”, “feature_name”, “score”, along with at least one of the following: “p_value”, “q_value”, or “adj_p”.

“probe_id” is a unique identifier, usually a platform-specific identifider (e.g., probe IDs for Affymetrix microarrays, or aptamer IDs for SomaScan assays). If not provided, it will be automatically generated.
“feature_name” is a name that identifies each feature, examples include ENSEMBL IDs for transcripts, UniProt IDs for proteins, and Refmet IDs for metabolites. To better identify the features, it is recommended to add an additional annotation column(s), e.g., “gene_symbol”. For metabolite features, it is recommended to include multiple annotation columns if available, e.g., HMDB ID and InChI key.
“p_value”, “q_value”, or “adj_p” refers to the p- or q-value representing the significance of each feature.
“score” is a numeric value that indicates the importance or significance of a feature. Depending on how the signature was derived, this can be the t-statistics, log-fold change, Z-score, or other summary statistics.

Here we use an example from the differential expression analysis using the limma package.

# Version reading from a txt file
# difexp <- read.table(
#   file.path(
#     system.file("extdata", package = "OmicSignature"),
#     "difmatrix_Myc_mice_liver_24m.txt"
#   ),
#   header = TRUE, sep = "\t", stringsAsFactors = FALSE
# )
# Version reading from a binary file
difexp <- readRDS(
  file.path(
    system.file("extdata", package = "OmicSignature"),
    "difmatrix_Myc_mice_liver_24m.rds"
  )
)
head(difexp)
#>   Probe.ID  logFC AveExpr      t P.Value adj.P.Val      b            ensembl
#> 1 10345228 -0.167   7.106 -1.470   0.186     0.560 -5.866 ENSMUSG00000103746
#> 2 10354534  0.041   4.351  0.520   0.620     0.870 -6.780 ENSMUSG00000060715
#> 3 10354529 -0.175   4.955 -0.941   0.379     0.731 -6.458 ENSMUSG00000043629
#> 4 10346337  0.025   8.621  0.188   0.857     0.962 -6.911 ENSMUSG00000038323
#> 5 10353792 -0.025   6.063 -0.284   0.785     0.936 -6.885 ENSMUSG00000045815
#> 6 10350848 -0.055   7.595 -0.630   0.549     0.836 -6.712 ENSMUSG00000049881
#>     gene_symbol
#> 1 1700001G17Rik
#> 2 1700019A02Rik
#> 3 1700019D03Rik
#> 4 1700066M21Rik
#> 5 1700101I19Rik
#> 6 2810025M15Rik

Manually change the column names to match the requirement. The built-in function replaceDifexpCol() is designed to replace some frequently used alternative column names.

difexp <- difexp %>% rename(feature_name = ensembl)
colnames(difexp) <- replaceDifexpCol(colnames(difexp))
head(difexp)
#>   probe_id  logfc  mean  score p_value adj_p      b       feature_name
#> 1 10345228 -0.167 7.106 -1.470   0.186 0.560 -5.866 ENSMUSG00000103746
#> 2 10354534  0.041 4.351  0.520   0.620 0.870 -6.780 ENSMUSG00000060715
#> 3 10354529 -0.175 4.955 -0.941   0.379 0.731 -6.458 ENSMUSG00000043629
#> 4 10346337  0.025 8.621  0.188   0.857 0.962 -6.911 ENSMUSG00000038323
#> 5 10353792 -0.025 6.063 -0.284   0.785 0.936 -6.885 ENSMUSG00000045815
#> 6 10350848 -0.055 7.595 -0.630   0.549 0.836 -6.712 ENSMUSG00000049881
#>     gene_symbol
#> 1 1700001G17Rik
#> 2 1700019A02Rik
#> 3 1700019D03Rik
#> 4 1700066M21Rik
#> 5 1700101I19Rik
#> 6 2810025M15Rik

2.4. Create the `OmicSignature` object

OmS <- OmicSignature$new(
  metadata = metadata,
  signature = signature,
  difexp = difexp
)
#>   [Success] OmicSignature object Myc_reduce_mice_liver_24m created.

Set print_message = TRUE to see the messages.

OmS <- OmicSignature$new(
  metadata = metadata,
  signature = signature,
  difexp = difexp,
  print_message = TRUE
)
#>   --Required attributes for metadata: signature_name, phenotype, organism, direction_type, assay_type --
#>   [Success] Metadata is saved. 
#>   [Success] Signature is valid. 
#>   [Success] difexp is valid. 
#>   [Success] OmicSignature object Myc_reduce_mice_liver_24m created.

See the created object information:

print(OmS)
#> Signature Object: 
#>   Metadata: 
#>     adj_p_cutoff = 0.05 
#>     assay_type = transcriptomics 
#>     covariates = none 
#>     description = mice Myc haploinsufficient (Myc(+/-)) 
#>     direction_type = bi-directional 
#>     keywords = Myc, KO, longevity 
#>     organism = Mus musculus 
#>     others = C57BL/6 
#>     phenotype = Myc_reduce 
#>     platform = transcriptomics by array 
#>     PMID = 25619689 
#>     sample_type = liver 
#>     score_cutoff = 5 
#>     signature_name = Myc_reduce_mice_liver_24m 
#>     year = 2015 
#>   Metadata user defined fields: 
#>     animal_strain = C57BL/6 
#>   Signature: 
#>     Length (15)
#>     Class (character)
#>     Mode (character)
#>   Differential Expression Data: 
#>     884 x 9

Use new criteria to extract significant features:
(this does not change the signature saved in the object)

OmS$extractSignature("abs(score) > 10; adj_p < 0.01")
#>   probe_id       feature_name   score direction
#> 1 10349648 ENSMUSG00000004552  14.762         +
#> 2 10345762 ENSMUSG00000026072 -13.543         -
#> 3 10353192 ENSMUSG00000025932  10.487         +
#> 4 10355259 ENSMUSG00000061816 -10.315         -

Besides saving and reading OmicSignature object in .rds format, you can export the object as a json text file.

saveRDS(OmS, "Myc_reduce_mice_liver_24m_OmS.rds")
writeJson(OmS, "Myc_reduce_mice_liver_24m_OmS.json")

See more in “Functionalities of OmicSignature” section.

3. Create an `OmicSignature` from `difexp` and `metadata`

You can by-pass the generating signature process once you are an expert. Simply provide cutoffs (e.g. adj_p_cutoff and score_cutoff) in the metadata, make sure difexp has “adj_p” and “score” columns, and use OmicSigFromDifexp() to automatically extract significant features and create the OmicSignature object.

OmS1 <- OmicSigFromDifexp(difexp, metadata)
#> -- criterias used to extract signatures:  abs(score) >= 5; adj_p <= 0.05 . 
#> 
#>   [Success] OmicSignature object Myc_reduce_mice_liver_24m created.
OmS1
#> Signature Object: 
#>   Metadata: 
#>     adj_p_cutoff = 0.05 
#>     assay_type = transcriptomics 
#>     covariates = none 
#>     description = mice Myc haploinsufficient (Myc(+/-)) 
#>     direction_type = bi-directional 
#>     keywords = Myc, KO, longevity 
#>     organism = Mus musculus 
#>     others = C57BL/6 
#>     phenotype = Myc_reduce 
#>     platform = transcriptomics by array 
#>     PMID = 25619689 
#>     sample_type = liver 
#>     score_cutoff = 5 
#>     signature_name = Myc_reduce_mice_liver_24m 
#>     year = 2015 
#>   Metadata user defined fields: 
#>     animal_strain = C57BL/6 
#>   Signature: 
#>     Length (15)
#>     Class (character)
#>     Mode (character)
#>   Differential Expression Data: 
#>     884 x 10

Vanessa Mengze Li

05/15/2025

1. Cheatsheet

2. Create an `OmicSignature` Object Step-by-Step

2.1. Metadata

2.1.1 “phenotype”

2.1.2 “organism

2.1.3 “sample_type” and “platform”

2.1.4 “direction_type”

2.1.5 “assay_type”

2.2. Signature

2.3. Differential expression analysis results (difexp)

2.4. Create the `OmicSignature` object

3. Create an `OmicSignature` from `difexp` and `metadata`

Create OmicSignature

Vanessa Mengze Li

05/15/2025

1. Cheatsheet

2. Create an OmicSignature Object Step-by-Step

2.1. Metadata

2.1.1 “phenotype”

2.1.2 “organism

2.1.3 “sample_type” and “platform”

2.1.4 “direction_type”

2.1.5 “assay_type”

2.2. Signature

2.3. Differential expression analysis results (difexp)

2.4. Create the OmicSignature object

3. Create an OmicSignature from difexp and metadata

2. Create an `OmicSignature` Object Step-by-Step

2.4. Create the `OmicSignature` object

3. Create an `OmicSignature` from `difexp` and `metadata`