Treffer: Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures

Title:

Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures

Authors:

Gibson, Elizabeth Atkeson

Publication Year:

2021

Collection:

Columbia University: Academic Commons

Subject Terms:

Environmental health, Biometry, Pregnancy, Machine learning--Statistical methods, Bayesian statistical decision theory--Industrial applications, Newborn infants--Health and hygiene, Environmental toxicology

Document Type:

Dissertation thesis

Language:

English

DOI:

10.7916/d8-tnfc-et36

Availability:

https://doi.org/10.7916/d8-tnfc-et36

Accession Number:

edsbas.DC9F2C00

Database:

BASE

Weitere Informationen

Background: Statistical and machine learning techniques are now being incorporated into high-dimensional mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. The research presented here concentrates on answering a single mixtures question: Are there exposure patterns within a mixture corresponding with sources or behaviors that give rise to exposure? Objective: This dissertation details work to design, adapt, and apply pattern recognition methods to environmental mixtures and introduces two methods adapted to specific challenges of environmental health data, (1) Principal Component Pursuit (PCP) and (2) Bayesian non-parametric non-negative matrix factorization (BN²MF). We build on this work to characterize the relationship between identified patterns of in utero endocrine disrupting chemical (EDC) exposure and child neurodevelopment. Methods: PCP---a dimensionality reduction technique in computer vision---decomposes the exposure mixture into a low-rank matrix of consistent patterns and a sparse matrix of unique or extreme exposure events. We incorporated two existing PCP extensions that suit environmental data, (1) a non-convex rank penalty, and (2) a formulation that removes the need for parameter tuning. We further adapted PCP to accommodate environmental mixtures by including (1) a non-negativity constraint, (2) a modified algorithm to allow for missing values, and (3) a separate penalty for measurements below the limit of detection (PCP-LOD). BN²MF decomposes the exposure mixture into three parts, (1) a matrix of chemical loadings on identified patterns, (2) a matrix of individual scores on identified patterns, and (3) and diagonal matrix of pattern weights. It places non-negative continuous priors on pattern loadings, weights, and individual scores and uses a non-parametric sparse prior ...

Treffer: Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures

Weitere Informationen

Links

Zusatz-Funktionen