Treffer: A unified knowledge graph linking foodomics to chemical-disease networks and flavor profiles.
Youn, J., Li, F., Simmons, G., Kim, S. & Tagkopoulos, I. FoodAtlas: automated knowledge extraction of food and chemicals from literature. Comput. Biol. Med. 181, 109072 (2024).
Cifuentes, A. Food analysis and foodomics. J. Chromatogr. A 1216, 7109 (2009).
García-Cañas, V., Simó, C., Herrero, M., Ibáñez, E. & Cifuentes, A. Present and future challenges in food analysis: foodomics. Anal. Chem. 84, 10150–10159 (2012).
FooDB. https://foodb.ca/. [Accessed at 12/25/2025].
McKillop, K., Harnly, J., Pehrsson, P., Fukagawa, N. & Finley, J. FoodData Central, USDA’s updated approach to food composition data systems. Curr. Dev. Nutr. 5, 596–596 (2021).
USDA FoodData Central. https://fdc.nal.usda.gov/. [Accessed at 12/25/2025].
Capozzi, F. & Bordoni, A. Foodomics: a new comprehensive approach to food and nutrition. Genes Nutr. 8, 1–4 (2013).
Min, W., Liu, C., Xu, L. & Jiang, S. Applications of knowledge graphs for food science and industry. Patterns 3, 100484 (2022).
Jahangir, M., Kim, H. K., Choi, Y. H. & Verpoorte, R. Health-affecting compounds in Brassicaceae. Compr. Rev. Food Sci. Food Saf. 8, 31–43 (2009).
Dooley, D. M. et al. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. Npj Sci. Food 2, 23 (2018).
Eftimov, T., Ispirova, G., Potočnik, D., Ogrinc, N. & Koroušić Seljak, B. ISO-FOOD ontology: a formal representation of the knowledge within the domain of isotopes for food science. Food Chem. 277, 382–390 (2019).
Furukawa H. Deep Learning for End-to-End Automatic Target Recognition from Synthetic Aperture Radar Imagery. IEICE Technical Report; IEICE Tech. Rep. 117, 35–40 (2018).
Devlin, J., Chang, M-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep BidirectionalTransformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. (2019).
Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems, 30. https://doi.org/10.48550/arXiv.1706.03762 (2017).
Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020).
Cenikj, G., Seljak, B. K. & Eftimov, T. FoodChem: A food-chemical relation extraction model. In 2021IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE. (2021).
Özen, N., Mu, W., van Asselt, ED. & van den Bulk, LM. Extracting chemical food safety hazards from the scientific literature automatically using large language models. Appl. Food Res. 5, 100679, https://doi.org/10.1016/j.afres.2024.100679 (2025).
Davis, A. P. et al. Comparative Toxicogenomics Database’s 20th anniversary: update 2025. Nucleic Acids Res. 53, D1328–D1334 (2025).
FlavorDB: a database of flavor molecules | Nucleic Acids Research | Oxford Academic. https://academic.oup.com/nar/article/46/D1/D1210/4559748.
Fonger, G. C. Hazardous substances data bank (HSDB) as a source of environmental fate information on chemicals. Toxicology 103, 137–145 (1995).
Haussmann, S. et al. FoodKG: a semantics-driven knowledge graph for food recommendation. In The Semantic Web – ISWC 2019 (eds Ghidini, C. et al.) Vol. 11779 146–162 (Springer International Publishing, Cham, 2019).
Park, D., Kim, K., Kim, S., Spranger, M. & Kang, J. FlavorGraph: a large-scale food-chemical graph for generating food representations and recommending food pairings. Sci. Rep. 11, 931 (2021).
Ni, Y., Jensen, K., Kouskoumvekaki, I. & Panagiotou, G. NutriChem 2.0: exploring the effect of plant-based foods on human health and drug efficacy. Database 2017, bax044 (2017).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
McInnes, L., Healy, J. & Astels, S. hdbscan: hierarchical density based clustering. J. Open Source Softw. 2, 205 (2017).
Mullahy, J. Specification and testing of some modified count data models. J. Econom. 33, 341–365 (1986).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57, 289–300 (1995).
Johnson, G. H. & Fritsche, K. Effect of dietary linoleic acid on markers of inflammation in healthy persons: a systematic review of randomized controlled trials. J. Acad. Nutr. Diet. 112, 1041.e1–15 (2012).
Crowell, P. L. Prevention and therapy of cancer by dietary monoterpenes. J. Nutr. 129, 775S–778S (1999).
Benzie, I. F. F. & Choi, S.-W. Chapter One - Antioxidants in food: content, measurement, significance, action, cautions, caveats, and research needs. In Advances in Food and Nutrition Research (ed. Henry, J.) Vol. 71 1–53 (Academic Press, 2014).
Shahidi, F. & Ambigaipalan, P. Phenolics and polyphenolics in foods, beverages and spices: antioxidant activity and health effects – A review. J. Funct. Foods 18, 820–897 (2015).
Chang, J., Wang, H., Su, W., He, X. & Tan, M. Artificial intelligence in food bioactive peptides screening: recent advances and future prospects. Trends Food Sci. Technol. 156, 104845 (2025).
Alvarez-Leite, J. I. The role of bioactive compounds in human health and disease. Nutrients 17, 1170 (2025).
GPT-5 System Card. https://openai.com/index/gpt-5-system-card/ (2025).
Zhang, Z. et al. Multimodal chain-of-thought reasoning in language models. https://openreview.net/forum?id=gDlsMWost9 (2023).
Liu, M. X. et al. ‘We Need Structured Output’: towards user-centered constraints on large language model output. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems 1–9 (Association for Computing Machinery, New York, NY, USA, 2024).
Lu, W. et al. Large language model for table processing: a survey. Front. Comput. Sci. 19, 192350 (2025).
Li, F., Youn, J., Millsop, C. & Tagkopoulos, I. Predicting clinical trial success for Clostridium difficile infections based on preclinical data. Front. Artif. Intell. 710.3389/frai.2024.1487335 (2024).
Ren, P. et al. A survey of deep active learning. ACM Comput. Surv. 54, 180:1-180:40 (2021).
Foster-Powell, K., Holt, S. H. & Brand-Miller, J. C. International table of glycemic index and glycemic load values: 2002. Am. J. Clin. Nutr. 76, 5–56 (2002).
Shivappa, N., Steck, S. E., Hurley, T. G., Hussey, J. R. & Hébert, J. R. Designing and developing a literature-derived, population-based dietary inflammatory index. Public Health Nutr. 17, 1689–1696 (2014).
Toro, S. et al. Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI). J. Biomed. Semant. 15, 19 (2024).
Youn, J., Naravane, T. & Tagkopoulos, I. Using Word Embeddings to Learn a Better Food Ontology. Front. Artif. Intell. 310.3389/frai.2020.584784. (2020).
Jacobs, D. R. Jr & Tapsell, L. C. Food synergy: the key to a healthy diet. Proc. Nutr. Soc. 72, 200–206 (2013).
Nemec, K. Cultural awareness of eating patterns in the health care setting. Clin. Liver Dis. 16, 204–207 (2020).
Forde, C. G. & de Graaf, K. Influence of sensory properties in moderating eating behaviors and food intake. Front. Nutr. 9, 841444 (2022).
Melse-Boonstra, A. Bioavailability of micronutrients from nutrient-dense whole foods: zooming in on dairy, vegetables, and fruits. Front. Nutr. 7, 101 (2020).
Benford, D. et al. The principles and methods behind EFSA’s guidance on uncertainty analysis in scientific assessment. EFSA J 16, e05122 (2018).
Shannar, A. et al. Pharmacodynamics (PD), pharmacokinetics (PK) and PK-PD modeling of NRF2 activating dietary phytochemicals in cancer prevention and in health. Curr. Pharmacol. Rep. 11, 6 (2024).
Vasilevsky, N. A. et al. Mondo: Unifying diseases for the world, by the world. Preprint at https://doi.org/10.1101/2022.04.13.22273750 (2022).
Hastings, J. et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44, D1214–D1219 (2016).
Kim, S. et al. PubChem 2023 update. Nucleic Acids Res. 51, D1373–D1380 (2023).
PubMed. PubMed https://pubmed.ncbi.nlm.nih.gov/. [Accessed at 12/25/2025].
PubMed Central (PMC). PubMed Central (PMC) https://pmc.ncbi.nlm.nih.gov/. [Accessed at 12/25/2025].
Kans, J. Entrez Direct: E-utilities on the Unix Command Line. In Entrez Programming Utilities Help [Internet] (National Center for Biotechnology Information (US), 2025).
Bird, S. & Loper, E. NLTK: The Natural Language Toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217, Barcelona, Spain. Association for Computational Linguistics. (2004).
Navarro, G. A guided tour to approximate string matching. ACM Comput. Surv. 33, 31–88 (2001).
OpenAI et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024).
Ye, J. et al. A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. Preprint at https://doi.org/10.48550/arXiv.2303.10420 (2023).
Davis, A. P., Wiegers, T. C., Rosenstein, M. C. & Mattingly, C. J. MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database. Database J. Biol. Databases Curation 2012, bar065 (2012).
Dhammi, I. K. & Kumar, S. Medical subject headings (MeSH) terms. Indian J. Orthop. 48, 443–444 (2014).
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
Zdrazil, B. et al. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 52, D1180–D1192 (2024).
Carlsen, M. H. et al. The total antioxidant content of more than 3100 foods, beverages, spices, herbs and supplements used worldwide. Nutr. J. 9, 3 (2010).
Laufkötter, O., Sturm, N., Bajorath, J., Chen, H. & Engkvist, O. Combining structural and bioactivity-based fingerprints improves prediction performance and scaffold hopping capability. J. Cheminformatics 11, 54 (2019).
Lenselink, E. B. et al. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminformatics 9, 45 (2017).
What We Eat In America (WWEIA) Database. Food Surveys Research Group https://doi.org/10.15482/USDA.ADC/1178144 (2015).
Calder, P. C. Omega-3 fatty acids and inflammatory processes. Nutrients 2, 355–374 (2010).
Mozaffarian, D. & Rimm, E. B. Fish intake, contaminants, and human health: evaluating the risks and the benefits. JAMA 296, 1885–1899 (2006).
Ramsden, C. E. et al. Use of dietary linoleic acid for secondary prevention of coronary heart disease and death: evaluation of recovered data from the Sydney Diet Heart Study and updated meta-analysis. BMJ 346, e8707 (2013).
Cassidy, A. et al. High anthocyanin intake is associated with a reduced risk of myocardial infarction in young and middle-aged women. Circulation 127, 188–196 (2013).
Kalt, W. et al. Recent research on the health benefits of blueberries and their anthocyanins. Adv. Nutr. 11, 224–236 (2020).
Hankinson, A., Lloyd, B. & Alweis, R. Lime-induced phytophotodermatitis. J. Community Hosp. Intern. Med. Perspect. 4, 25090 (2014).
Sacks, F. M. et al. Dietary fats and cardiovascular disease: a presidential advisory from the American Heart Association. Circulation 136, e1–e23 (2017).
de Souza, R. J. et al. Intake of saturated and trans unsaturated fatty acids and risk of all cause mortality, cardiovascular disease, and type 2 diabetes: systematic review and meta-analysis of observational studies. BMJ 351, h3978 (2015).
Imamura, F. et al. Consumption of sugar sweetened beverages, artificially sweetened beverages, and fruit juice and incidence of type 2 diabetes: systematic review, meta-analysis, and estimation of population attributable fraction. BMJ 351, h3576 (2015).
Yang, Q. et al. Added sugar intake and cardiovascular diseases mortality among US adults. JAMA Intern. Med. 174, 516–524 (2014).
Schwingshackl, L. et al. Food groups and risk of all-cause mortality: a systematic review and meta-analysis of prospective studies. Am. J. Clin. Nutr. 105, 1462–1473 (2017).
Rocha, J., Borges, N. & Pinho, O. Table olives and health: a review. J. Nutr. Sci. 9, e57 (2020).
Li, S.-C. et al. Almond consumption improved glycemic control and lipid profiles in patients with type 2 diabetes mellitus. Metabolism 60, 474–479 (2011).
Lee-Bravatti, M. A. et al. Almond consumption and risk factors for cardiovascular disease: a systematic review and meta-analysis of randomized controlled trials. Adv. Nutr. 10, 1076–1088 (2019).
Martin, N., Germanò, R., Hartley, L., Adler, A. J. & Rees, K. Nut consumption for the primary prevention of cardiovascular disease. Cochrane Database Syst. Rev. 2015, CD011583 (2015).
Jiang, R. Nut and peanut butter consumption and risk of type 2 diabetes in women. JAMA 288, 2554 (2002).
Blumberg, J., Vita, J. & Chen, C. Concord grape juice polyphenols and cardiovascular risk factors: dose-response relationships. Nutrients 7, 10032–10052 (2015).
Stein, J. H., Keevil, J. G., Wiebe, D. A., Aeschlimann, S. & Folts, J. D. Purple grape juice improves endothelial function and reduces the susceptibility of LDL cholesterol to oxidation in patients with coronary artery disease. Circulation 100, 1050–1055 (1999).
Zhao, J., Wang, X., Lin, H. & Lin, Z. Hazelnut and its by-products: a comprehensive review of nutrition, phytochemical profile, extraction, bioactivities and applications. Food Chem. 413, 135576 (2023).
Tey, S. L. et al. Effects of different forms of hazelnuts on blood lipids and α-tocopherol concentrations in mildly hypercholesterolemic individuals. Eur. J. Clin. Nutr. 65, 117–124 (2011).
Jazinaki, M. S., Rashidmayvan, M. & Pahlavani, N. The effect of pomegranate juice supplementation on C-reactive protein levels: GRADE -assessed systematic review and dose–response updated meta-analysis of data from randomized controlled trials. Phytother. Res. 38, 2818–2831 (2024).
Basu, A. & Penugonda, K. Pomegranate juice: a heart-healthy fruit juice. Nutr. Rev. 67, 49–56 (2009).
Khaw, K.-T. et al. Randomised trial of coconut oil, olive oil or butter on blood lipids and other cardiovascular risk factors in healthy men and women. BMJ Open 8, e020167 (2018).
Eyres, L., Eyres, M. F., Chisholm, A. & Brown, R. C. Coconut oil consumption and cardiovascular risk factors in humans. Nutr. Rev. 74, 267–280 (2016).
Yang, D. K. Cabbage (Brassica oleracea var. capitata) protects against H
Jiang, Y. et al. Cruciferous vegetable intake is inversely correlated with circulating levels of proinflammatory markers in women. J. Acad. Nutr. Diet. 114, 700–708.e2 (2014).
McKay, D., Eliasziw, M., Chen, C. & Blumberg, J. A pecan-rich diet improves cardiometabolic risk factors in overweight and obese adults: a randomized controlled trial. Nutrients 10, 339 (2018).
Robbins, K. S., Gong, Y., Wells, M. L., Greenspan, P. & Pegg, R. B. Reprint of “Investigation of the antioxidant capacity and phenolic constituents of U.S. pecans. J. Funct. Foods 18, 1002–1013 (2015).
Feeney, E. L., Lamichhane, P. & Sheehan, J. J. The cheese matrix: understanding the impact of cheese structure on aspects of cardiovascular health – a food science and a human nutrition perspective. Int. J. Dairy Technol. 74, 656–670 (2021).
Rangel, A. H. D. N. et al. An overview of the occurrence of bioactive peptides in different types of cheeses. Foods 12, 4261 (2023).
Lemke, S. L. et al. Dietary intake of stearidonic acid–enriched soybean oil increases the omega-3 index: randomized, double-blind clinical study of efficacy and safety. Am. J. Clin. Nutr. 92, 766–775 (2010).
Baer, D. J., Henderson, T. & Gebauer, S. K. Consumption of high-oleic soybean oil improves lipid and lipoprotein profile in humans compared to a palm oil blend: a randomized controlled trial. Lipids 56, 313–325 (2021).
Fang, S., Lin, F., Qu, D., Liang, X. & Wang, L. Characterization of purified red cabbage anthocyanins: improvement in HPLC separation and protective effect against H2O2-induced oxidative stress in HepG2 cells. Molecules 24, 124 (2018).
Wiczkowski, W., Szawara-Nowak, D. & Topolska, J. Red cabbage anthocyanins: profile, isolation, identification, and antioxidant activity. Food Res. Int. 51, 303–309 (2013).
Siervo, M. et al. Nitrate-rich beetroot juice reduces blood pressure in Tanzanian adults with elevated blood pressure: a double-blind randomized controlled feasibility trial. J. Nutr. 150, 2460–2468 (2020).
Clifford, T., Howatson, G., West, D. & Stevenson, E. The potential benefits of red beetroot supplementation in health and disease. Nutrients 7, 2801–2822 (2015).
Grover, A. & Leskovec, J. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACMSIGKDD international conference on Knowledge discovery and data mining. (2016).
Weitere Informationen
Modern nutrition science still lacks a comprehensive, machine-readable map linking diet to molecular composition and biological effects. Here we present FoodAtlas, a large-scale knowledge graph that links 1430 foods to 3610 chemicals, 2181 diseases, and 958 flavor descriptors through 96,981 provenance-tracked edges. A transformer-based text-mining pipeline extracted 48,474 quantitative food-chemical associations from 125,723 literature sentences (F <subscript>1</subscript> = 0.67) and integrated them with 23,211 chemical-disease assertions from the Comparative Toxicogenomics Database, 15,222 chemical-bioactivity records from ChEMBL, 3645 flavor annotations from FlavorDB and PubChem, and 6429 taxonomic relationships. Graph embeddings revealed six dietary modules whose signature metabolites delineate distinct, multisystem disease-risk trajectories. Models built on FoodAtlas demonstrate practical utility: a bioactivity predictor achieved strong correlation with antioxidant assays (R² = 0.52; ρ = 0.72), and a substitution engine reduced simulated total disease risk by 11.9%.
(© 2026. The Author(s).)
Competing interests: The authors declare no competing interests.