Treffer: Low dimensional structure in single cell data

Title:
Low dimensional structure in single cell data
Publication Year:
2024
Collection:
Columbia University: Academic Commons
Document Type:
Dissertation thesis
Language:
English
DOI:
10.7916/96ay-zy30
Accession Number:
edsbas.147D89D1
Database:
BASE

Weitere Informationen

This thesis presents the development of three methods, each of which concerns the estimation of interpretable low dimensional representations of high dimensional data. The first two chapters consider methods for fitting low dimensional nonlinear representations. In Chapter 1, we discuss the deterministic input, noisy "and" gate (DINA) model and in Chapter 2, binary variational autoencoders. We present an example of application to single cell assay for transposase accessible chromatin sequencing data (single cell ATACseq), where the DINA model uncovers meaningful discrete representations of cell state. In scientific applications, practitioners have substantial prior knowledge of the latent components driving variation in the data. The third Chapter develops a supervised matrix factorization method, Spectra, that leverages annotations from experts and previous biological experiments to uncover latent representations of single cell RNAseq data. Variational inference for the DINA model: The deterministic input, noisy "and" gate (DINA) model allows for matrix decomposition where latent factors are allowed to interact via an "and" relationship. We develop a variational inference approach for estimating the parameters of the DINA model. Previous approaches based on variational inference enumerate the space of latent binary parameters (requiring exponential numbers of parameters) and cannot fit an unknown number of latent components. Here, we report that a practical mean field variational inference approach relying on a nonparametric cumulative shrinkage process prior and stochastic coordinate ascent updates achieves competitive results with existing methods while simultaneously determining the number of latent components. This approach allows scaling exploratory Q-matrix estimation to datasets of practical size with minimal hyperparameter tuning. Gradient estimation for binary latent variable models: In order to fit binary variational autoencoders, the gradient of the objective function must be estimated. Generally ...