Treffer: Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data.

Title:
Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data.
Authors:
Zhao K; Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada., Oualkacha K; Département de Mathématiques, Université du Québec à Montréal, Montreal, Quebec, Canada., Zeng Y; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada., Shen C; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada., Klein K; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada., Lakhal-Chaieb L; Département de Mathématiques et de Statistique, Université Laval, Quebec, Quebec, Canada., Labbe A; Département de Sciences de la Décision, HEC Montrèal, Montreal, Quebec, Canada., Pastinen T; Genomic Medicine Center, Children's Mercy, Independence, Missouri, USA., Hudson M; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada.; Department of Medicine, McGill University, Montreal, Quebec, Canada., Colmegna I; Department of Medicine, McGill University, Montreal, Quebec, Canada.; The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada., Bernatsky S; Department of Medicine, McGill University, Montreal, Quebec, Canada.; The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada., Greenwood CMT; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada.; Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada.; Department of Human Genetics and Gerald Bronfman Department of Oncology, McGill University, Montreal, Quebec, Canada.
Source:
Statistics in medicine [Stat Med] 2024 Sep 10; Vol. 43 (20), pp. 3899-3920. Date of Electronic Publication: 2024 Jun 26.
Publication Type:
Journal Article; Research Support, Non-U.S. Gov't
Language:
English
Journal Info:
Publisher: Wiley Country of Publication: England NLM ID: 8215016 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1097-0258 (Electronic) Linking ISSN: 02776715 NLM ISO Abbreviation: Stat Med Subsets: MEDLINE
Imprint Name(s):
Original Publication: Chichester ; New York : Wiley, c1982-
References:
Lister R, Pelizzola M, Dowen RH, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462(7271):315.
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014;15(2):121‐132.
Cheng L, Zhu Y. A classification approach for DNA methylation profiling with bisulfite next‐generation sequencing data. Bioinformatics. 2013;30(2):172‐179.
Lakhal‐Chaieb L, Greenwood CM, Ouhourane M, Zhao K, Abdous B, Oualkacha K. A smoothed EM‐algorithm for DNA methylation profiles from sequencing‐based methods in cell lines or for a single cell type. Stat Appl Genet Mol Biol. 2017;16(5‐6):333‐347.
Forslind K, Ahlmén M, Eberhardt K, Hafström I, Svensson B. Prediction of radiological outcome in early rheumatoid arthritis in clinical practice: role of antibodies to citrullinated peptides (anti‐CCP). Ann Rheum Dis. 2004;63(9):1090‐1095.
Shao X, Hudson M, Colmegna I, et al. Rheumatoid arthritis‐relevant DNA methylation changes identified in ACPA‐positive asymptomatic individuals using methylome capture sequencing. Clin Epigenetics. 2019;11(1):110.
Zeng Y, Zhao K, Oros Klein K, et al. Thousands of CpGs show DNA methylation differences in ACPA‐positive individuals. Genes. 2021;12(9):1349. doi:10.3390/genes12091349.
Eckhardt F, Lewin J, Cortese R, et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet. 2006;38(12):1378‐1385.
Affinito O, Palumbo D, Fierro A, et al. Nucleotide distance influences co‐methylation between nearby CpG sites. Genomics. 2020;112(1):144‐150.
Jaenisch R, Bird A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet. 2003;33:245.
Hansen KD, Timp W, Bravo HC, et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet. 2011;43(8):768.
Rackham OJ, Langley SR, Oates T, et al. A Bayesian approach for analysis of whole‐genome bisulphite sequencing data identifies disease‐associated changes in DNA methylation. Genetics. 2017;205:1443‐1458.
Zhao K, Oualkacha K, Lakhal‐Chaieb L, et al. A novel statistical method for modeling covariate effects in bisulfite sequencing derived measures of DNA methylation. Biometrics. 2021;77(2):424‐438.
Akalin A, Kormaksson M, Li S, et al. methylKit: a comprehensive R package for the analysis of genome‐wide DNA methylation profiles. Genome Biol. 2012;13(10):1‐9.
Dolzhenko E, Smith AD. Using beta‐binomial regression for high‐precision differential methylation analysis in multifactor whole‐genome bisulfite sequencing experiments. BMC Bioinformatics. 2014;15(1):215.
Feng H, Conneely KN, Wu H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res. 2014;42(8):e69.
Park Y, Wu H. Differential methylation analysis for BS‐seq data under general experimental design. Bioinformatics. 2016;32(10):1446‐1453.
Lea AJ, Tung J, Zhou X. A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data. PLoS Genet. 2015;11(11):e1005650.
Cui S, Ji T, Li J, Cheng J, Qiu J. What if we ignore the random effects when analyzing RNA‐seq data in a multifactor experiment. Stat Appl Genet Mol Biol. 2016;15(2):87‐105.
Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88(421):9‐25.
Molenberghs G, Verbeke G, Demétrio CG. An extended random‐effects approach to modeling repeated, overdispersed count data. Lifetime Data Anal. 2007;13(4):513‐531.
Vahabi N, Kazemnejad A, Datta S. A joint overdispersed marginalized random‐effects model for analyzing two or more longitudinal ordinal responses. Stat Methods Med Res. 2019;28(1):50‐69.
Molenberghs G, Verbeke G, Demétrio CG, Vieira AM. A family of generalized linear models for repeated measures with normal and conjugate random effects. Stat Sci. 2010;25(3):325‐347.
Molenberghs G, Verbeke G, Iddi S, Demétrio CG. A combined beta and normal random‐effects model for repeated, overdispersed binary and binomial data. J Multivar Anal. 2012;111:94‐109.
Ivanova A, Molenberghs G, Verbeke G. A model for overdispersed hierarchical ordinal data. Stat Model. 2014;14(5):399‐415.
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodology. 1977;39:1‐38.
Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Vol 12. Cambridge, UK: Cambridge University Press; 2003.
Wolfinger R. Laplace's approximation for nonlinear mixed models. Biometrika. 1993;80(4):791‐795.
Rabe‐Hesketh S, Skrondal A, Pickles A. Reliable estimation of generalized linear mixed models using adaptive quadrature. Stata J. 2002;2(1):1‐21.
Shun Z, McCullagh P. Laplace approximation of high dimensional integrals. J R Stat Soc B Methodol. 1995;57(4):749‐760.
Wood SN. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc Series B Stat Methodology. 2011;73(1):3‐36.
Fletcher D. Estimating overdispersion when fitting a generalized linear model to sparse data. Biometrika. 2012;99(1):230‐237.
Wreczycka K, Gosdschan A, Yusuf D, Gruening B, Assenov Y, Akalin A. Strategies for analyzing bisulfite sequencing data. J Biotechnol. 2017;261:105‐115.
Parker R, Rice J. Discussion on “some aspects of the spline smoothing approach to non‐parametric regression curve fitting” (by B. W. Silverman). J R Stat Soc B Methodol. 1985;47(1):40‐42.
Wahba G. Spline bases, regularization, and generalized cross‐validation for solving approximation problems with large quantities of noisy data. Approximation Theory III. Cambridge, MA: Academic Press; 1980:905‐912.
Wahba G. Bayesian “confidence intervals” for the cross‐validated smoothing spline. J R Stat Soc B Methodol. 1983;45(1):133‐150.
Silverman BW. Some aspects of the spline smoothing approach to non‐parametric regression curve fitting. J R Stat Soc B Methodol. 1985;47(1):1‐21.
Nelder JA, Pregibon D. An extended quasi‐likelihood function. Biometrika. 1987;74(2):221‐232. doi:10.1093/biomet/74.2.221.
Tierney L, Kadane JB. Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc. 1986;81(393):82‐86.
Wood SN. On p‐values for smooth components of an extended generalized additive model. Biometrika. 2013;100(1):221‐228.
Saha KK. Semiparametric estimation for the dispersion parameter in the analysis of over‐or underdispersed count data. J Appl Stat. 2008;35(12):1383‐1397.
Wood SN, Pya N, Säfken B. Smoothing parameter and model selection for general smooth models. J Am Stat Assoc. 2016;111(516):1548‐1563.
Wood SN. Generalized Additive Models: an Introduction with R. Boca Raton, FL: CRC Press; 2017.
Nocedal J, Wright SJ. Numerical Optimization. New York: Springer; 1999.
Elashoff M, Ryan L. An EM algorithm for estimating equations. J Comput Graph Stat. 2004;13(1):48‐65.
Lindsay B. Conditional score functions: some optimality results. Biometrika. 1982;69(3):503‐512.
Heyde C, Morton R. Quasi‐likelihood and generalizing the EM algorithm. J R Stat Soc B Methodol. 1996;58(2):317‐327.
Small CG, Christopher G, Wang J. Numerical Methods for Nonlinear Estimating Equations. Vol 29. Oxford, UK: Oxford University Press; 2003.
Hebestreit K, Dugas M, Klein HU. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics. 2013;29(13):1647‐1653.
Hansen KD, Langmead B, Irizarry RA. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 2012;13(10):R83.
Korthauer K, Chakraborty S, Benjamini Y, Irizarry RA. Detection and accurate false discovery rate control of differentially methylated regions from whole genome bisulfite sequencing. Biostatistics. 2019;20(3):367‐383.
Goeman JJ, Van De Geer SA, Van Houwelingen HC. Testing against a high dimensional alternative. J R Stat Soc Series B Stat Methodology. 2006;68(3):477‐493.
Prochenka A, Pokarowski P, Gasperowicz P, et al. A cautionary note on using binary calls for analysis of DNA methylation. Bioinformatics. 2015;31(9):1519‐1520.
Hudson M, Bernatsky S, Colmegna I, et al. Novel insights into systemic autoimmune rheumatic diseases using shared molecular signatures and an integrative analysis. Epigenetics. 2017;12(6):433‐440.
Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55(4):997‐1004.
Mathis S, Jala VR, Haribabu B. Role of leukotriene B4 receptors in rheumatoid arthritis. Autoimmun Rev. 2007;7(1):12‐17.
Li JP, Yang CY, Chuang HC, et al. The phosphatase JKAP/DUSP22 inhibits T‐cell receptor signalling and autoimmunity by inactivating Lck. Nat Commun. 2014;5(1):1‐13.
Slot AJ, Zuurmond AM, Bardoel AF, et al. Identification of PLOD2 as telopeptide lysyl hydroxylase, an important enzyme in fibrosis. J Biol Chem. 2003;278(42):40967‐40972.
Goldring MB, Marcu KB. Cartilage homeostasis in health and rheumatic diseases. Arthritis Res Ther. 2009;11(3):1‐16.
Schoofs T, Rohde C, Hebestreit K, et al. DNA methylation changes are a late event in acute promyelocytic leukemia and coincide with loss of transcription factor binding. Blood. 2013;121(1):178‐187.
Meaney MJ, Szyf M. Environmental programming of stress responses through DNA methylation: life at the interface between a dynamic environment and a fixed genome. Dialogues Clin Neurosci. 2005;7(2):103.
Grant Information:
MOP 130344 Canada CIHR; 2541 4128 Digital Research Alliance of Canada; 2017 (B/CB) Genome Canada; RGPIN-2024-06287 Natural Sciences and Engineering Research Council of Canada; MOP 130344 Canada CIHR
Contributed Indexing:
Keywords: DNA methylation; EM algorithm; additive dispersion; binomial; measurement error; multiplicative dispersion
Substance Nomenclature:
0 (Sulfites)
OJ9787WBLU (hydrogen sulfite)
Entry Date(s):
Date Created: 20240627 Date Completed: 20240820 Latest Revision: 20250703
Update Code:
20250703
DOI:
10.1002/sim.10149
PMID:
38932470
Database:
MEDLINE

Weitere Informationen

Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."
(© 2024 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.)