Treffer: Comparing methods for constructing and representing human pangenome graphs

Title:
Comparing methods for constructing and representing human pangenome graphs
Contributors:
Algorithmes pour les séquences biologiques - Sequence Bioinformatics, Institut Pasteur Paris (IP)-Université Paris Cité (UPCité), Collège Doctoral, Sorbonne Université (SU), Hub Bioinformatique et Biostatistique - Bioinformatics and Biostatistics HUB, R.C was supported by ANR Full-RNA, SeqDigger, Inception and PRAIRIE grants (ANR-22-CE45-0007, ANR-19-CE45-0008, PIA/ANR16-CONV-0005, ANR-19-P3IA-0001). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grants agreements No. 872539 and 956229., ANR-22-CE45-0007,full-RNA,Fouille non biaisée dans les banques de données RNA-seq massives(2022), ANR-19-CE45-0008,SeqDigger,Moteur de recherche de donne´es de se´quenc¸age en ge´nomique environnementale(2019), ANR-16-CONV-0005,INCEPTION,Institut Convergences pour l'étude de l'Emergence des Pathologies au Travers des Individus et des populatiONs(2016), ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), European Project: 872539,H2020-EU.1.3. - EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions, H2020-EU.1.3.3. - Stimulating innovation by means of cross-fertilisation of knowledge,H2020-MSCA-RISE-2019,PANGAIA(2020), European Project: 956229,H2020-EU.1.3. - EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions,ALPACA(2021)
Source:
ISSN: 1465-6906.
Publisher Information:
HAL CCSD
BioMed Central
Publication Year:
2023
Collection:
Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
Document Type:
Fachzeitschrift article in journal/newspaper
Language:
English
Relation:
info:eu-repo/semantics/altIdentifier/pmid/38037131; info:eu-repo/grantAgreement//872539/EU/Pan-genome Graph Algorithms and Data Integration/PANGAIA; info:eu-repo/grantAgreement//956229/EU/ALgorithms for PAngenome Computational Analysis/ALPACA; pasteur-04385553; https://pasteur.hal.science/pasteur-04385553; https://pasteur.hal.science/pasteur-04385553/document; https://pasteur.hal.science/pasteur-04385553/file/s13059-023-03098-2.pdf; PUBMED: 38037131; PUBMEDCENTRAL: PMC10691155
DOI:
10.1186/s13059-023-03098-2
Rights:
http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
Accession Number:
edsbas.8743BCFE
Database:
BASE

Weitere Informationen

International audience ; Background As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. Results In this work, we collect all publicly available high-quality human haplotypes and construct the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: , , , and . We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. Conclusion This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.