For this tutorial we will simulate the input gene expression data, the cis and trans genes with known mediations, and we will run iEDGE to recover these mediations.

Setting up the iEDGE object

We start by generating the dataset. You can use construct_iEDGE to generate the iEDGE dataset from standard matrices (binary matrices for alteration data or numeric matrices for expression data).

Please note that this simulation is optimized for the evaluation of the iEDGE methology, and its only purpose is to demonstrate how to construct an iEDGE object from scratch for use in standard analysis.

library(iEDGE)

set.seed(1)

# define number of samples
n <- 50

# construct 'cn': binary alternations by samples matrix two
# independent alterations across 50 samples
cn1 <- sample(x= c(1,0), size = n, replace = T, prob = c(0.7, 0.3))
cn2 <- sample(x= c(1,0), size = n, replace = T, prob = c(0.5, 0.5))

cn <- rbind(cn1, cn2)
colnames(cn) <- paste0("sample_", 1:ncol(cn))

# construct 'cis' gene expressions: genes correlated with one or more
# alterations in 'cn'
cis <-  t(sapply(c(1,1,1,2,2,2), function(i) {
        3 + cn[i, ] + rnorm(n, 0, 0.3)
}))
rownames(cis) <- make.unique(paste0("cis", c(1,1,1,2,2,2)))

# construct 'trans-mediated' gene expression: gene correlated with
# one or more alteration in 'cn' and mediated through a cis gene

cis.m <- sample(rownames(cis), 10, replace = T)
trans.m <- t(sapply(1:length(cis.m), function(i)
        3 + cis[cis.m[i], ] + rnorm(n, 0, 0.3)
    ))
rownames(trans.m) <- make.unique(paste0("t.m.", cis.m))

# construct 'trans-direct' gene expression: trans gene correlated
# with one or more alteration in 'cn' but not mediated through cis
# genes, this would be similar to construction of cis genes
trans.d <-  t(sapply(c(1,1,1,2,2,2), function(i)
    3 + cn[i, ] + rnorm(n, 0, 0.6)
    ))
rownames(trans.d) <- make.unique(paste0("t.d.", c(1,1,1,2,2,2)))

# uncorrelated genes
others <- t(sapply(1:30, function(i) 3+rnorm(n, 0, 0.3)))
rownames(others) <- paste0("others_", 1:nrow(others))

# putting genes together in expression matrix
gep <- rbind(cis, trans.m, trans.d, others)
colnames(gep) <- paste0("sample_", 1:ncol(gep))

cisgenes <- list(cn1 = grep("cis1", rownames(cis), value = T),
                 cn2 = grep("cis2", rownames(cis), value = T))

# the direction column in cn.fdat specifies whether to compute
# one-sided or two-sided p-values in differential tests. In this
# simulation we expect an increase in expression with the presence of
# the alteration, hence a one-sided ("UP") p-value is computed.
cn.fdat <- data.frame(cnid = rownames(cn), direction = "UP")
rownames(cn.fdat) <- rownames(cn)

gep.fdat <- data.frame(geneid = rownames(gep))
rownames(gep.fdat) <- rownames(gep)

dat <- construct_iEDGE(cn, gep, cisgenes, transgenes = NA, cn.fdat, gep.fdat, 
                cn.pdat = NA, gep.pdat = NA)

Explanation of the iEDGE object components

The iEDGE object is a list consisting of:

  1. cn: Copy-Number of other binary alteration ExpressionSet:

    • exprs(dat$cn): the binary matrix encoding presence or absence of an alteration (c alterations by n samples)

    • fData(dat$cn): alteration annotation data frame (c alterations by g alteration annotation columns). Please note that if direction of differential expression test is dependent on the alteration, specify as such to be used in run_iEDGE, e.g. run_iEDGE(… cndir = “direction”, uptest = “UP”, downtest = “DOWN”) corresponds to reporting one-sided pvalues for significantly upregulated in genes for alterations in which fData(dat$cn)[, “direction”] == “UP”, or significantly downregulated genes for alterations in which fData(dat$cn)[, “direction”] == “DOWN”

    • pData(dat$cn): sample annotation data frame (n samples by l sample annotation columns)

  2. gep: The gene expression ExpressionSet:

    • exprs(dat$gep): gene expression matrix, assumed to be already log2 transformed (m genes by n samples)

    • fData(dat$gep): gene annotation data frame (m genes by j gene annotation columns)

    • pData(dat$gep): sample annotation data frame (n samples by k sample annotation columns)

  3. cisgenes: A list of character vectors specifying the genes that in the cis regions of each alteration, must be in the same order as the rows in exprs(dat$cn). For copy number alterations we defined the cis genes to be genes in the focal peaks of each amplification or deletion of interest.

  4. transgenes (optional): A list of character vectors specifying the genes outside the cis regions (trans genes). If not specified, all genes in the expression matrix that are not the cis genes are considered. Trans genes specification is not recommended for exploratory analysis but may be useful for testing mediation and pathway enrichment for particular target gene sets in hypothesis-driven approaches.

dat
## $cn
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 2 features, 50 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: sample_1 sample_2 ... sample_50 (50 total)
##   varLabels: colid
##   varMetadata: labelDescription
## featureData
##   featureNames: cn1 cn2
##   fvarLabels: cnid direction
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## 
## $gep
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 52 features, 50 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: sample_1 sample_2 ... sample_50 (50 total)
##   varLabels: colid
##   varMetadata: labelDescription
## featureData
##   featureNames: cis1 cis1.1 ... others_30 (52 total)
##   fvarLabels: geneid
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## 
## $cisgenes
## $cisgenes$cn1
## [1] "cis1"   "cis1.1" "cis1.2"
## 
## $cisgenes$cn2
## [1] "cis2"   "cis2.1" "cis2.2"

Running iEDGE

Now that the iEDGE object is created, we can run the iEDGE pipeline on this dataset.

Running iEDGE with differential expression only:

This will create the results object with differential expression tables for cis and trans genes

# cis and trans differential expression analysis only
# no mediation analysis
# no html report
res <- run_iEDGE(dat = dat, header = "testrun", outdir = ".", 
                 cnid = "cnid", cndesc = "cnid", cndir = "direction", 
                 uptest = "UP", downtest = "DOWN",
                 gepid = "geneid", enrichment = FALSE, bipartite = FALSE, html = FALSE)
## Running iEDGE for data set: testrun
## Making Differential Expression tables...
## Performancing cis analysis...
## Running iEDGE differential expression for 2 alterations: 
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
## Writing table to ./testrun/tables/testrun_cis_sig_fdr_0.25.txt
## Writing table to ./testrun/tables/testrun_cis_full.txt
## Performancing trans analysis...
## Running iEDGE differential expression for 2 alterations: 
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
## Writing table to ./testrun/tables/testrun_trans_sig_fdr_0.05.txt
## Done!
names(res)
## [1] "de"        "de.enrich" "pruning"   "ui"
head(res$de$cis$full)
##        geneid fold.change adj.P.Val.all        t high.class mean1.unlog
## cis1.1 cis1.1    2.132349  3.810924e-24 12.45441       case    15.10178
## cis1.2 cis1.2    2.037804  1.156082e-22 11.70850       case    15.71264
## cis1     cis1    1.916725  3.747522e-20 10.70101       case    15.44336
## cis2     cis2    2.045179  1.156082e-22 11.70744       case    15.49973
## cis2.1 cis2.1    1.918057  3.896869e-20 10.65738       case    15.27832
## cis2.2 cis2.2    1.900450  8.035568e-20 10.50649       case    15.25603
##        mean0.unlog sd0.unlog sd1.unlog cnid direction interaction_type
## cis1.1    6.542553  1.465369  3.168008  cn1        UP              cis
## cis1.2    7.212833  1.903196  3.605576  cn1        UP              cis
## cis1      7.670445  2.081563  3.150070  cn1        UP              cis
## cis2      7.075236  1.886268  3.559967  cn2        UP              cis
## cis2.1    7.446159  1.652364  3.521293  cn2        UP              cis
## cis2.2    7.512069  1.634062  3.505537  cn2        UP              cis
##            logFC  AveExpr      P.Value    adj.P.Val        B mean0.log
## cis1.1 1.0924433 3.634254 6.351540e-25 1.905462e-24 69.81907  2.891393
## cis1.2 1.0270153 3.703061 5.743513e-23 8.615269e-23 60.83393  3.004690
## cis1   0.9386434 3.715198 2.498348e-20 2.498348e-20 49.57859  3.076920
## cis2   1.0322269 3.410606 5.780412e-23 1.734124e-22 60.85680  2.977071
## cis2.1 0.9396453 3.447198 3.247391e-20 4.871087e-20 49.15384  3.052547
## cis2.2 0.9263412 3.452841 8.035568e-20 8.035568e-20 47.56241  3.063777
##        mean1.log   sd0.log   sd1.log n0 n1
## cis1.1  3.983836 0.2661433 0.2704973 16 34
## cis1.2  4.031706 0.3132603 0.3018580 16 34
## cis1    4.015564 0.3486787 0.2625479 16 34
## cis2    4.009298 0.3282196 0.3359340 29 21
## cis2.1  3.992192 0.2774623 0.3171413 29 21
## cis2.2  3.990118 0.2782353 0.3182936 29 21
head(res$de$trans$full)
##                geneid fold.change adj.P.Val.all        t high.class
## t.m.cis1.1 t.m.cis1.1    2.058719  1.113504e-09 8.257112       case
## t.m.cis1.2 t.m.cis1.2    2.048716  3.095161e-09 7.909354       case
## t.m.cis1     t.m.cis1    1.875885  3.747706e-09 7.773147       case
## t.d.1           t.d.1    2.068440  5.221347e-06 5.636337       case
## t.d.1.1       t.d.1.1    2.036324  7.517011e-06 5.514853       case
## t.d.1.2       t.d.1.2    1.910407  5.188124e-04 4.259915       case
##            mean1.unlog mean0.unlog sd0.unlog sd1.unlog cnid direction
## t.m.cis1.1   130.97073   62.936666 22.403932 43.210031  cn1        UP
## t.m.cis1.2   137.37791   68.380230 26.189159 37.396372  cn1        UP
## t.m.cis1     131.10487   70.856702 21.898342 30.521851  cn1        UP
## t.d.1         15.96287    7.664441  5.865252  7.642273  cn1        UP
## t.d.1.1       16.21092    7.938313  5.287850  6.876634  cn1        UP
## t.d.1.2       16.02314    8.130853  5.495517  8.454433  cn1        UP
##            interaction_type     logFC  AveExpr      P.Value    adj.P.Val
## t.m.cis1.1            trans 1.0417471 6.645719 4.544915e-11 2.227009e-09
## t.m.cis1.2            trans 1.0347199 6.729740 1.623003e-10 3.976358e-09
## t.m.cis1              trans 0.9075714 6.716427 2.676933e-10 4.372324e-09
## t.d.1                 trans 1.0485433 3.626562 6.926277e-07 8.484689e-06
## t.d.1.1               trans 1.0259674 3.666232 1.073859e-06 1.052382e-05
## t.d.1.2               trans 0.9338801 3.618542 8.470406e-05 6.917498e-04
##                     B mean0.log mean1.log   sd0.log   sd1.log n0 n1
## t.m.cis1.1 14.9623347  5.937331  6.979078 0.4059199 0.4280755 16 34
## t.m.cis1.2 13.7029102  6.026131  7.060851 0.5216744 0.3940067 16 34
## t.m.cis1   13.2079775  6.099279  7.006850 0.4695779 0.3438308 16 34
## t.d.1       5.4640829  2.913553  3.962096 0.7221742 0.5880397 16 34
## t.d.1.1     5.0346486  2.968574  3.994541 0.7366806 0.5798271 16 34
## t.d.1.2     0.7893726  2.983503  3.917383 0.7726423 0.7383662 16 34

Running iEDGE with differential expression only and HTML report:

This will create the results object with differential expression tables for cis and trans genes only, and an associated HTML report

# cis and trans differential expression analysis only
# no mediation analysis
# do html report
res <- run_iEDGE(dat = dat, header = "testrun", outdir = ".", 
                 cnid = "cnid", cndesc = "cnid", cndir = "direction", 
                 uptest = "UP", downtest = "DOWN",
                 gepid = "geneid", enrichment = FALSE, 
                 bipartite = FALSE, html = TRUE)
## Running iEDGE for data set: testrun
## Making Differential Expression tables...
## Performancing cis analysis...
## Running iEDGE differential expression for 2 alterations: 
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
## Writing table to ./testrun/tables/testrun_cis_sig_fdr_0.25.txt
## Writing table to ./testrun/tables/testrun_cis_full.txt
## Performancing trans analysis...
## Running iEDGE differential expression for 2 alterations: 
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
## Writing table to ./testrun/tables/testrun_trans_sig_fdr_0.05.txt
## Done!
## Writing by alteration cis full DE tables...
## Writing cis boxplots...
## Writing trans boxplots...
## Writing by alteration cis DE tables...
## Writing by alteration trans DE tables...
## Writing summary table...
## Adding links to main table...
## Adding main table to main index...
## Done constructing html report at directory: ./testrun/html
# the result html page should be opened automatically, this displays the graphical view of results

Running iEDGE with differential expression and mediation analysis (bipartite) and HTML report:

This will create the results object with differential expression tables for cis and trans genes and mediation analysis, and an associated HTML report

# cis and trans differential expression analysis only
# do mediation analysis
# do html report
res <- run_iEDGE(dat = dat, header = "testrun", outdir = ".", 
                 cnid = "cnid", cndesc = "cnid", cndir = "direction", 
                 uptest = "UP", downtest = "DOWN",
                 gepid = "geneid", enrichment = FALSE, 
                 bipartite = TRUE, html = TRUE)
## Running iEDGE for data set: testrun
## Making Differential Expression tables...
## Performancing cis analysis...
## Running iEDGE differential expression for 2 alterations: 
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
## Writing table to ./testrun/tables/testrun_cis_sig_fdr_0.25.txt
## Writing table to ./testrun/tables/testrun_cis_full.txt
## Performancing trans analysis...
## Running iEDGE differential expression for 2 alterations: 
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
## Writing table to ./testrun/tables/testrun_trans_sig_fdr_0.05.txt
## Done!
## Running pruning for  cn1 
## [1] 1
## [1] "number cis"
## [1] 3
## [1] "number trans"
## [1] 6
## Performing cis ordering...
## Performing trans ordering on 5 sets...
## 
## Running pruning for  cn2 
## [1] 2
## [1] "number cis"
## [1] 3
## [1] "number trans"
## [1] 10
## Performing cis ordering...
## Performing trans ordering on 8 sets...
## 
## Writing by alteration cis full DE tables...
## Writing cis boxplots...
## Writing trans boxplots...
## Writing by alteration cis DE tables...
## Writing by alteration trans DE tables...
## Writing summary table...
## Adding links to main table...
## Adding main table to main index...
## Writing bipartite graphs...
## Done constructing html report at directory: ./testrun/html
# significant cn-cis-trans connections found using mediation analysis
res$pruning$sig
## $cn1
##      cis      trans yind zind   taudiff      tau0         tau1  tauratio
## 1 cis1.2      t.d.1    2    4 0.5420271 1.0485433  0.506516245 0.5169334
## 3 cis1.1    t.d.1.2    1    6 1.2585709 0.9338801 -0.324690839 1.3476794
## 4   cis1   t.m.cis1    3    3 0.7631243 0.9075714  0.144447104 0.8408422
## 5 cis1.1 t.m.cis1.1    1    1 1.2319340 1.0417471 -0.190186947 1.1825654
## 6   cis1 t.m.cis1.2    3    2 1.0403221 1.0347199 -0.005602134 1.0054142
##          Z         S       pvalue          fdr
## 1 1.782062 0.3041573 3.736959e-02 1.345305e-01
## 3 3.035446 0.4146247 1.200901e-03 5.404054e-03
## 4 4.753845 0.1605278 9.979214e-07 5.987529e-06
## 5 6.287764 0.1959256 1.610360e-10 2.898648e-09
## 6 6.139045 0.1694599 4.150955e-10 3.735860e-09
## 
## $cn2
##       cis        trans yind zind   taudiff      tau0        tau1  tauratio
## 3  cis2.1      t.d.2.2    2    8 0.4916785 1.0756830  0.58400453 0.4570849
## 4    cis2     t.m.cis2    1    1 0.8209718 1.1270505  0.30607867 0.7284251
## 5  cis2.1   t.m.cis2.1    2    3 0.8365285 0.9352537  0.09872523 0.8944402
## 6  cis2.1 t.m.cis2.1.1    2    6 1.0545167 0.8939961 -0.16052060 1.1795540
## 7  cis2.1 t.m.cis2.1.2    2    7 0.9844624 0.8363706 -0.14809181 1.1770648
## 8  cis2.2   t.m.cis2.2    3    2 0.8995824 0.9720952  0.07251276 0.9254057
## 9    cis2   t.m.cis2.3    1    5 0.9507412 0.9825324  0.03179122 0.9676436
## 10   cis2   t.m.cis2.4    1    4 1.2855678 1.1221337 -0.16343419 1.1456459
##           Z         S       pvalue          fdr
## 3  1.758017 0.2796779 3.937231e-02 1.476461e-01
## 4  4.995617 0.1643384 2.932394e-07 1.256740e-06
## 5  5.773441 0.1448925 3.883433e-09 2.912574e-08
## 6  6.737861 0.1565061 8.036751e-12 1.205513e-10
## 7  5.507752 0.1787412 1.817228e-08 1.090337e-07
## 8  6.483173 0.1387565 4.490663e-11 4.490663e-10
## 9  5.116760 0.1858092 1.554143e-07 7.770716e-07
## 10 7.032506 0.1828037 1.014282e-12 3.042846e-11

The result html page should open automatically, and it will include an additional column “bipartite” with hyperlinks to the graphical display of the mediation analysis.

iEDGE HTML report

The HTML report of the final run is contained here