Treffer: Quantifying recent variation and relatedness in human populations

Title:

Quantifying recent variation and relatedness in human populations

Authors:

Gusev, Alexander

Publication Year:

2012

Collection:

Columbia University: Academic Commons

Subject Terms:

Human genetics--Data processing, Population, Human population genetics, Population genetics, Genetics--Data processing, Genetics, Computer science

Document Type:

Dissertation thesis

Language:

English

Relation:

https://doi.org/10.7916/D86M3CHH

DOI:

10.7916/D86M3CHH

Availability:

https://doi.org/10.7916/D86M3CHH

Accession Number:

edsbas.CB0966AA

Database:

BASE

Weitere Informationen

Advances in the genetic analysis of humans have revealed a surprising abundance of local relatedness between purportedly unrelated individuals. Where common mutations classically inform us of ancient relationships, such segments of pairwise identical by descent (IBD) sharing from a common ancestor are the observable traces of recent inter-mating. Combining these two distinct sources of information can help disentangle the complex genetic structure and flux in human populations. When considered together with a heritable trait, the segments can also be used to interrogate unascertained rare variation and help in locating trait-effecting loci. This work presents methods for comprehensive analysis of population-wide IBD and explores applications to disease and the understanding of recent genetic variation. We propose several strategies for efficient detection of IBD segments in population genotype data. Our novel seed-based algorithm, GERMLINE, can reduce the computational burden of finding pairwise segments from quadratic to nearly linear time in a general population. We demonstrate that this approach is several orders of magnitude faster than the available all-pairs methods while maintaining higher accuracy. Next, we extended the GERMLINE technique to process cohorts of unlimited size by adaptively adjusting the search mechanism to meet resource restrictions. We confirm its effectiveness with an analysis of 50,000 individuals where contemporary methods can only process a few thousand. One draw-back of these two algorithms is the dependence on phased haplotype data as input - a constraint that becomes more difficult with large populations. We propose a solution to this problem with an algorithm that analyzes genotype data directly by exploring all potential haplotypes and scoring each putative segment based on linkage-disequilibrium. This solution significantly outperforms available methods when applied to full sequence data and is computationally efficient enough to analyze thousands of sequenced genomes where ...

Treffer: Quantifying recent variation and relatedness in human populations

Weitere Informationen

Links

Zusatz-Funktionen