Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter November 16, 2019

EBADIMEX: an empirical Bayes approach to detect joint differential expression and methylation and to classify samples

  • Tobias Madsen EMAIL logo , Michał Świtnicki , Malene Juul and Jakob Skou Pedersen

Abstract

DNA methylation and gene expression are interdependent and both implicated in cancer development and progression, with many individual biomarkers discovered. A joint analysis of the two data types can potentially lead to biological insights that are not discoverable with separate analyses. To optimally leverage the joint data for identifying perturbed genes and classifying clinical cancer samples, it is important to accurately model the interactions between the two data types. Here, we present EBADIMEX for jointly identifying differential expression and methylation and classifying samples. The moderated t-test widely used with empirical Bayes priors in current differential expression methods is generalised to a multivariate setting by developing: (1) a moderated Welch t-test for equality of means with unequal variances; (2) a moderated F-test for equality of variances; and (3) a multivariate test for equality of means with equal variances. This leads to parametric models with prior distributions for the parameters, which allow fast evaluation and robust analysis of small data sets. EBADIMEX is demonstrated on simulated data as well as a large breast cancer (BRCA) cohort from TCGA. We show that the use of empirical Bayes priors and moderated tests works particularly well on small data sets.

Funding source: Independent Research Fund Denmark

Award Identifier / Grant number: 7016-00379

Funding source: Sapere Aude

Award Identifier / Grant number: 12-126439

Funding source: Innovation Fund Denmark

Award Identifier / Grant number: 10-092320/DSF

Funding statement: This study was supported by Independent Research Fund Denmark, Grant Number: 7016-00379, Sapere Aude, Grant Number: 12-126439 and Innovation Fund Denmark, Grant Number: 10-092320/DSF.

Appendix

A Statistical notes

A.1 Huber Loss

To limit the influence of a single data type (expression or methylation), when performing classification, we use the Huber Loss instead of the log-density.

If XN(μ,σ2), the log-density function is quadratic in xμ,

logfX(x)12(xμ)2σ2.

The Huber-loss function L(x) is linear when x is more than k standard deviations away from μ.

L(x){12(xμ)2σ2|xμ|kσ2kσ|xμ|+(kσ)22σ2|xμ|>kσ

We use k = 2.5 in our data analyses.

A.2 Normalization

Let Rgi denote the raw read count for gene g in sample i. The read counts are scaled by a sample-specific factor and then log-transformed,

Egi=logRgiKi.

We implement two normalization strategies, namely total count (TC) and upper-quartile (UQ). In TC normalization we normalize with the total library size of the specific sample, i.e.

Ki=gRgi.

In UQ normalization we set Ki equal to the upper-quartile of the set {Rgi}g. The recommended method is UQ normalization (Bullard et al., 2010).

A.3 Filtering

We filter out lowly-expressed genes using a method outlined in Ding et al. (2015). For all genes we compute a specified quantile (by default the median, q = 0.5). Typically, a large group of genes display low median expression (Supplementary Figure S6). Based on a histogram we could dichotomize into two groups by manually setting a threshold. The selection of this threshold can be aided by fitting a two component normal mixture to the data: The genes having a posterior probability of belonging to the lowly expressed class larger than a half are not considered in subsequent analysis.

A.4 Kullback-Leibler divergence

Kullback-Leibler divergence can be used as an alternative ranking criterion to p-values. The Kullback-Leibler divergence between two normal distributions qN(μ1,σ12) and pN(μ2,σ22) is given by

KL(Q∣∣P)=ϕ(x;μ1,σ12)log(ϕ(x;μ1,σ12)ϕ(x;μ2,σ22))dx=ϕ(x;0,σ12)log(ϕ(x;0,σ12)ϕ(x;μ2μ1,σ22))dx=ϕ(x;0,σ12)(logσ2σ1+12σ22(xμ2+μ1)2+12σ12x2)dx=logσ2σ1+σ12σ22+(μ1μ2)22σ22

More generally, for a multivariate normal, we have

KL(Q∣∣P)=12[log|Σ2||Σ1|d+tr(Σ21Σ1)+(μ2μ1)Σ21(μ2μ1)]

B R-package

All functionality of EBADIMEX is implemented as an R-package. The package, with accompanying tutorial, is available at https://github.com/TobiasMadsen/EBADIMEX.

References

Aryee, M. J., A. E. Jaffe, H. Corrada-Bravo, C. Ladd-Acosta, A. P. Feinberg, K. D. Hansen and R. A. Irizarry (2014): “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays,” Bioinformatics, 30, 1363–1369.10.1093/bioinformatics/btu049Search in Google Scholar PubMed PubMed Central

Bailer-Jones, C. and K. Smith (2011): Combining probabilities. Data Processing and Analysis Consortium (DPAS), GAIA-C8-TN-MPIA-CBJ-053.Search in Google Scholar

Bibikova, M., B. Barnes, C. Tsan, V. Ho, B. Klotzle, J. M. Le, D. Delano, L. Zhang, G. P. Schroth, K. L. Gunderson, J. B. Fan and R. Shen (2011): “High density DNA methylation array with single CpG site resolution,” Genomics, 98, 288–295.10.1016/j.ygeno.2011.07.007Search in Google Scholar PubMed

Breiman, L., A. Cutler, A. Liaw and M. Wiener (2006): “randomforest: Breiman and cutler’s random forests for classification and regression.”Search in Google Scholar

Brenet, F., M. Moh, P. Funk, E. Feierstein, A. J. Viale, N. D. Socci and J. M. Scandura (2011): “DNA methylation of the first exon is tightly linked to transcriptional silencing,” PloS One, 6, e14524.10.1371/journal.pone.0014524Search in Google Scholar PubMed PubMed Central

Bullard, J. H., E. Purdom, K. D. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.10.1186/1471-2105-11-94Search in Google Scholar PubMed PubMed Central

Dedeurwaerder, S., M. Defrance, E. Calonne, H. Denis, C. Sotiriou and F. Fuks (2011): “Evaluation of the Infinium Methylation 450k Technology,” Epigenomics, 3, 771–784.10.2217/epi.11.105Search in Google Scholar PubMed

Demissie, M., B. Mascialino, S. Calza and Y. Pawitan (2008): “Unequal group variances in microarray data analyses,” Bioinformatics, 24, 1168–1174.10.1093/bioinformatics/btn100Search in Google Scholar PubMed

Ding, J., , M. K. McConechy, H. M. Horlings, G. Ha, F. C. Chan, T. Funnell, S. C. Mullaly, J. Reimand, A. Bashashati, G. D. Bader, D. Huntsman, S. Aparicio, A. Condon and S. P. Shah (2015): “Systematic analysis of somatic mutations impacting gene expression in 12 tumour types,” Nat. Commun., 6, 8554.10.1038/ncomms9554Search in Google Scholar PubMed PubMed Central

Dixon, W. J. and J. W. Tukey (1968): “Approximate behavior of the distribution of Winsorized t (trimming/winsorization 2),” Technometrics, 10, 83–98.10.2307/1266226Search in Google Scholar

Du, P., X. Zhang, C.-C. Huang, N. Jafari, W. A. Kibbe, L. Hou and S. M. Lin (2010): “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis,” BMC Bioinformatics., 11, 587.10.1186/1471-2105-11-587Search in Google Scholar PubMed PubMed Central

Esteller, M. (2008): “Epigenetics in cancer,” N. Engl. J. Med., 358, 1148–1159.10.1056/NEJMra072067Search in Google Scholar PubMed

Fisher, R. A. (1932): Statistical methods for research workers, Oliver and Boyd, Edinburgh.Search in Google Scholar

Gelman, A. (2011): Arm: Data analysis using regression and multilevel/hierarchical models. http://cran. r-project. org/web/packages/arm.Search in Google Scholar

Grossman, R. L., A. P. Heath, V. Ferretti, H. E. Varmus, D. R. Lowy, W. A. Kibbe and L. M. Staudt (2016): “Toward a shared vision for cancer genomic data,” N. Engl. J. Med., 375, 1109–1112.10.1056/NEJMp1607591Search in Google Scholar

Huber, P. and E. Ronchetti (2009): Robust statistics, John Wiley & Sons, Inc., Hoboken, NJ, USA.10.1002/9780470434697Search in Google Scholar

Jeong, J., L. Li, Y. Liu, K. P. Nephew, T. H.-M. Huang and C. Shen (2010): “An empirical bayes model for gene expression and methylation profiles in antiestrogen resistant breast cancer,” BMC Med. Genomics, 3, 55.10.1186/1755-8794-3-55Search in Google Scholar

Jjingo, D., A. B. Conley, V. Y. Soojin, V. V. Lunyak and I. K. Jordan (2012): “On the presence and role of human gene-body DNA methylation,” Oncotarget, 3, 462–474.10.18632/oncotarget.497Search in Google Scholar PubMed

Jones, P. A. (2012): “Functions of DNA methylation: islands, start sites, gene bodies and beyond,” Nat. Rev. Genet., 13, 484.10.1038/nrg3230Search in Google Scholar PubMed

Jones, P. A. and S. B. Baylin (2007): “The epigenomics of cancer,” Cell, 128, 683–692.10.1016/j.cell.2007.01.029Search in Google Scholar PubMed

Karatzoglou, A., A. Smola and K. Hornik (2013): “Kernlab: Kernel-based machine learning lab. Eumetopias ju-batus) distributions and their environment,” J. Theor. Biol., 1–10.Search in Google Scholar

Kass, S. U., N. Landsberger and A. P. Wolffe (1997): “DNA methylation directs a time-dependent repression of transcription initiation,” Curr. Biol., 7, 157–165.10.1016/S0960-9822(97)70086-1Search in Google Scholar PubMed

Kristensen, V. N., O. C. Lingjærde, H. G. Russnes, H. K. M. Vollan, A. Frigessi and A.-L. Børresen-Dale (2014): “Principles and methods of integrative genomic analyses in cancer,” Nat. Rev. Cancer, 14, 299–313.10.1038/nrc3721Search in Google Scholar PubMed

Kuhn, M. (2015): “Caret: classification and regression training, Astrophysics Source Code Library”.Search in Google Scholar

Levenson, V. V. (2010): “DNA methylation as a universal biomarker,” Expert. Rev. Mol. Diagn., 10, 481–488.10.1586/erm.10.17Search in Google Scholar PubMed PubMed Central

List, M., A.-C. Hauschild, Q. Tan, T. A. Kruse, J. Baumbach and R. Batra (2014): Classification of breast cancer subtypes by combining gene expression and DNA methylation data,” J. Integr. Bioinform., 11, 1–14.10.1515/jib-2014-236Search in Google Scholar

Love, M. I., W. Huber and S. Anders (2014): “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol., 15, 550.10.1186/s13059-014-0550-8Search in Google Scholar PubMed PubMed Central

Ma, K., B. Cao and M. Guo (2016): “The detective, prognostic, and predictive value of DNA methylation in human esophageal squamous cell carcinoma,” Clin. Epigenetics, 8, 43.10.1186/s13148-016-0210-9Search in Google Scholar PubMed PubMed Central

McCarthy, D. J., Y. Chen and G. K. Smyth (2012): “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation,” Nucleic Acids Res., 40, 4288–4297.10.1093/nar/gks042Search in Google Scholar PubMed PubMed Central

Mendizabal, I., J. Zeng, T. E. Keller and S. V. Yi (2017): “Body-hypomethylated human genes harbor extensive intragenic transcriptional activity and are prone to cancer-associated dysregulation,” Nucleic Acids Res., 45, 4390–4400.10.1093/nar/gkx020Search in Google Scholar PubMed PubMed Central

Meyer, D., E. Dimitriadou, K. Hornik, A. Weingessel and F. Leisch (2016): e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071), tu wien, 2015, R package version, p. 1–6.Search in Google Scholar

Morris, T. J., L. M. Butcher, A. Feber, A. E. Teschendorff, A. R. Chakravarthy, T. K. Wojdacz and S. Beck (2013): “ChAMP: 450k chip analysis methylation pipeline,” Bioinformatics, 30, 428–430.10.1093/bioinformatics/btt684Search in Google Scholar PubMed PubMed Central

R Core Team (2017): R: A language and environment for statistical computing, R foundation for statistical computing, Vienna, Austria.Search in Google Scholar

Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi and G. K. Smyth (2015): “limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Res., 43, e47.10.1093/nar/gkv007Search in Google Scholar PubMed PubMed Central

Scott, W. D. (2008): Multivariate density estimation: theory, practice, and visualization, John Wiley & Sons, Inc., Hoboken, NJ, USA.Search in Google Scholar

Smyth, Gordon K. (2004): “Linear models and empirical bayes methods for assessing differential expression in microarray experiments,” Stat. Appl. Genet. Mol. Biol., 3, 1–25.10.2202/1544-6115.1027Search in Google Scholar PubMed

Smith, Z. D. and A. Meissner (2013): “DNA methylation: roles in mammalian development,” Nat. Rev. Genet., 14, 204–220.10.1038/nrg3354Search in Google Scholar PubMed

Smith, A. D., D. Roda and T. A. Yap (2014): “Strategies for modern biomarker and drug development in oncology,” J. Hematol. Oncol., 7, 70.10.1186/s13045-014-0070-8Search in Google Scholar PubMed PubMed Central

Strand, S. H., T. F. Orntoft and K. D. Sorensen (2014): “Prognostic DNA methylation markers for prostate cancer,” Int. J. Mol. Sci., 15, 16544–16576.10.3390/ijms150916544Search in Google Scholar PubMed PubMed Central

Świtnicki, M. P., M. Juul, T. Madsen, K. D. Sørensen and J. S. Pedersen (2016): “PINCAGE: probabilistic integration of cancer genomics data for perturbed gene identification and sample classification,” Bioinformatics, 32, 1353–1365.10.1093/bioinformatics/btv758Search in Google Scholar PubMed

Weinstein, J. N., E. A. Collisson, G. B. Mills, K. R. M. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander and J. M. Stuart (2013): “The cancer genome atlas pan-cancer analysis project,” Nat. Genet., 45, 1113–1120.10.1038/ng.2764Search in Google Scholar PubMed PubMed Central

Wu, D., J. Gu and M. Q. Zhang (2013): “FastDMA: an infinium humanmethylation450 beadchip analyzer,” PloS One, 8, e74275.10.1371/journal.pone.0074275Search in Google Scholar PubMed PubMed Central

Yang, X., H. Han, D. D. De Carvalho, F. D. Lay, P. A. Jones and G. Liang (2014): “Gene body methylation can alter gene expression and is a therapeutic target in cancer,” Cancer Cell, 26, 577–590.10.1016/j.ccr.2014.07.028Search in Google Scholar PubMed PubMed Central

Zhong, D. and H. Cen (2017): “Aberrant promoter methylation profiles and association with survival in patients with hepatocellular carcinoma,” OncoTargets Ther., 10, 2501.10.2147/OTT.S128058Search in Google Scholar PubMed PubMed Central


Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0050).


Published Online: 2019-11-16

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 8.5.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2018-0050/html
Scroll to top button