Treffer: Augmenting microbial phylogenomic signal with tailored marker gene sets.

Title:
Augmenting microbial phylogenomic signal with tailored marker gene sets.
Source:
Nature Communications; 11/12/2025, Vol. 16 Issue 1, p1-12, 12p
Database:
Complementary Index

Weitere Informationen

Phylogenetic marker genes are traditionally selected from a fixed collection of whole genomes representing major microbial phyla, covering only a small fraction of gene families. However, most microbial diversity resides in metagenome-assembled genomes, which exhibit taxonomic imbalance and harbor gene families that do not fit the criteria for universal orthologs. To address these limitations, we introduce TMarSel, a software tool for automated, free-from-expert opinion, and tailored marker selection for deep microbial phylogenomics. TMarSel allows users to select a variable number of markers and copies based on KEGG and EggNOG gene family annotations, enabling a systematic evaluation of the phylogenetic signal from the entire gene family pool. We show that an expanded marker selection tailored to the input genomes improves the accuracy of phylogenetic trees across simulated and real-world datasets of whole genomes and metagenome-assembled genomes compared to previous markers, even when metagenome-assembled genomes lack a fraction of open reading frames. The selected markers have functional annotations related to metabolism, cellular processes, and environmental information processing, in addition to replication, translation, and transcription. TMarSel provides flexibility in the number of markers, copies, and annotation databases while remaining robust against taxonomic imbalance and incomplete genomic data. Marker genes used in microbial phylogenomics are limited to fixed gene sets selected from complete genomes. TMarSel is a flexible yet robust method for selecting any number of markers from genomes or MAGs that mitigate the impact of taxonomic imbalance and incomplete genomic data on tree quality. [ABSTRACT FROM AUTHOR]

Copyright of Nature Communications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)