Treffer: Multimodal learning enables chat-based exploration of single-cell data.
Original Publication: New York, NY : Nature Pub. Co., [1996-
Quake, S. R. The cell as a bag of RNA. Trends Genet. 37, 1064–1068 (2021). (PMID: 3446215610.1016/j.tig.2021.08.003)
Stark, R. et al. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019). (PMID: 3134126910.1038/s41576-019-0150-2)
Aldridge, S. Single cell transcriptomics comes of age. Nat. Commun. 11, 4307 (2020). (PMID: 32855414745300510.1038/s41467-020-18158-5)
Regev, R. et al. The Human Cell Atlas White Paper. Preprint at https://doi.org/10.48550/arXiv.1810.05192 (2018).
Zappia, L. Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol. 22, 301 (2021). (PMID: 34715899855527010.1186/s13059-021-02519-4)
Simon, E. et al. Language models for biological research: a primer. Nat. Methods 21, 1422–1429 (2024). (PMID: 3912295110.1038/s41592-024-02354-y)
Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat. Methods 21, 1430–1443 (2024). (PMID: 3912295210.1038/s41592-024-02353-z)
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) (2021).
Clough, E. et al. NCBI GEO: archive for gene expression and epigenomics data sets: 23-year update. Nucleic Acids Res. 52, D138–D144 (2024). (PMID: 3793385510.1093/nar/gkad965)
Edgar, R. et al. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002). (PMID: 117522959912210.1093/nar/30.1.207)
CZI Single-Cell Biology Program et al. CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic Acids Res. 53, D886–D900 (2025).
Jiang, A. Q. et al. Mistral 7B. Preprint at https://doi.org/10.48550/arXiv.2310.06825 (2023).
Liu, H. et al. Visual instruction tuning. In Proc. 37th International Conference on NeurIPS (eds Oh, A. et al.) (2023).
Megill, C. et al. cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. Preprint at bioRxiv https://doi.org/10.1101/2021.04.05.438318 (2021).
Lachmann, A. et al. Massive mining of publicly available RNA-seq data from human and mouse. Nat. Commun. 9, 1366 (2018). (PMID: 29636450589363310.1038/s41467-018-03751-6)
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). (PMID: 372586801094995610.1038/s41586-023-06139-9)
Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020). (PMID: 3150188510.1093/bioinformatics/btz682)
Tabula Sapiens Consortium et al. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022). (PMID: 10.1126/science.abl4896)
Heng, T. S. P. et al. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008). (PMID: 1880015710.1038/ni1008-1091)
Kock, K. H. et al. Asian diversity in human immune cells. Cell 188, 2288–2306 (2025). (PMID: 4011280110.1016/j.cell.2025.02.017)
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022). (PMID: 3494981210.1038/s41592-021-01336-8)
Zhang, A. W. et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16, 1007–1015 (2019). (PMID: 31501550748559710.1038/s41592-019-0529-1)
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024). (PMID: 3840922310.1038/s41592-024-02201-0)
Rosen, Y. et al. Universal cell embeddings: a foundation model for cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.11.28.568918 (2023).
Petropoulos, S. et al. Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos. Cell 165, 1012–1026 (2016). (PMID: 27062923486882110.1016/j.cell.2016.03.023)
Molè, M. A. et al. A single cell characterisation of human embryogenesis identifies pluripotency transitions and putative anterior hypoblast centre. Nat. Commun. 12, 3679 (2021). (PMID: 34140473821166210.1038/s41467-021-23758-w)
Meistermann, D. et al. Integrated pseudotime analysis of human pre-implantation embryo single-cell transcriptomes reveals the dynamics of lineage specification. Cell Stem Cell 28, 1625–1640 (2021). (PMID: 3400417910.1016/j.stem.2021.04.027)
Tyser, R. C. V. et al. Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285–289 (2021). (PMID: 34789876761535310.1038/s41586-021-04158-y)
Liu, L. et al. Modeling post-implantation stages of human development into early organogenesis with stem-cell-derived peri-gastruloids. Cell 186, 3776–3792 (2023). (PMID: 3747886110.1016/j.cell.2023.07.018)
Zeng, B. et al. The single-cell and spatial transcriptional landscape of human gastrulation and early brain development. Cell Stem Cell 30, 851–866 (2023). (PMID: 371926161024122310.1016/j.stem.2023.04.016)
Cao, J., et al. A human cell atlas of fetal gene expression. Science 370, eaba7721 (2020). (PMID: 33184181778012310.1126/science.aba7721)
Xiao, Z. et al. 3D reconstruction of a gastrulating human embryo. Cell 187, 2855–2874 (2024). (PMID: 3865760310.1016/j.cell.2024.03.041)
Krausgruber, T. et al. Structural cells are key regulators of organ-specific immune responses. Nature 583, 296–302 (2020). (PMID: 32612232761034510.1038/s41586-020-2424-4)
Amersfoort, J. et al. Immunomodulation by endothelial cells - partnering up with the immune system? Nat. Rev. Immunol. 22, 576–588 (2022). (PMID: 35288707892006710.1038/s41577-022-00694-4)
Davidson, S. et al. Fibroblasts as immune regulators in infection, inflammation and cancer. Nat. Rev. Immunol. 21, 704–717 (2021). (PMID: 3391123210.1038/s41577-021-00540-z)
Larsen, S. B. et al. Epithelial cells: liaisons of immunity. Curr. Opin. Immunol. 62, 45–53 (2020). (PMID: 3187443010.1016/j.coi.2019.11.004)
Rosenfeld, R. Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88, 1270–1278 (2000).
Crowley, G. et al. Benchmarking cell type annotation by large language models with AnnDictionary. Preprint at bioRxiv https://doi.org/10.1101/2024.10.10.617605 (2024).
Hou, W. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nat. Methods 21, 1462–1465 (2024). (PMID: 385281861131007310.1038/s41592-024-02235-4)
Parikh, K. et al. Colonic epithelial cell diversity in health and inflammatory bowel disease. Nature 567, 49–55 (2019). (PMID: 3081473510.1038/s41586-019-0992-y)
Clevers, H. The intestinal crypt, a prototype stem cell compartment. Cell 154, 274–284 (2013). (PMID: 2387011910.1016/j.cell.2013.07.004)
Wang, Y. et al. Long-term culture captures injury-repair cycles of colonic stem cells. Cell 179, 1144–1159 (2019). (PMID: 31708126690490810.1016/j.cell.2019.10.015)
Lopez, R. et al. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018). (PMID: 30504886628906810.1038/s41592-018-0229-2)
Conde, D. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, eabl5197 (2022). (PMID: 10.1126/science.abl5197)
Malta, T. M. et al. Machine learning identifies stemness features associated with oncogenic dedifferentiation. Cell 173, 338–354 (2018). (PMID: 29625051590219110.1016/j.cell.2018.03.034)
Zhai, X. et al. LiT: zero-shot transfer with locked-image text tuning. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (ed. O’Conner, L.) (IEEE, 2021).
Yin, S., et al. A survey on multimodal large language models. Natl Sci. Rev. 11, nwae403 (2024). (PMID: 396792131164512910.1093/nsr/nwae403)
Bengio, Y. et al. International Scientific Report on the Safety of Advanced AI (Department for Science, Innovation and Technology and AI Safety Institute, 2024).
Urbina, F. et al. Dual use of artificial intelligence-powered drug discovery. Nat. Mach. Intell. 4, 189–191 (2022). (PMID: 36211133954428010.1038/s42256-022-00465-9)
Schaefer, M. et al. Joint embedding of transcriptomes and text enables interactive single-cell RNA-seq data exploration via natural language. In Proc. ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ed. Kim, B.) (ICLR, 2024).
Schaefer, M. et al. Multimodal learning of transcriptomes and text enables interactive single-cell RNA-seq data exploration with natural-language chats. Preprint at bioRxiv https://doi.org/10.1101/2024.10.15.618501 (2024).
Zhao, S. et al. LangCell: language-cell pre-training for cell identity understanding. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) (2024).
Rizvi, S. A. et al. Scaling large language models for next-generation single-cell analysis. Preprint at bioRxiv https://doi.org/10.1101/2025.04.14.648850 (2025).
Roohani, Y. et al. BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments. In Proc. ICLR 2024 Workshop on Large Language Models for Agents (ICLR, 2024).
Lobentanzer, S. et al. A platform for the biomedical application of large language models. Nat. Biotechnol. 43, 166–169 (2025). (PMID: 398435801221603110.1038/s41587-024-02534-3)
Levine et al. Cell2Sentence: teaching large language models the language of biology. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) (2024).
Jiang, A. Q, et al. Mixtral of experts. Preprint at https://doi.org/10.48550/arXiv.2401.04088 (2024).
Yao, S. et al. ReAct: synergizing reasoning and acting in language models. In Proc.11th International Conference on Learning Representations (ICLR, 2023).
Kong, A. et al. Better zero-shot reasoning with role-play prompting. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol. 1: Long Papers) (eds Duh, K. et al.) (ACL, 2024).
Brown, T. B. et al. Language models are few-shot learners. In Proc. 34th International Conference on NeurIPS (eds Larochelle, H. et al.) (2020).
Paszke, A. et al. Automatic differentiation in PyTorch. In Proc. 31st Conference on NeurIPS Autodiff Workshop (NIPS, 2017).
Falcon, W. et al. PyTorchLightning/pytorch-lightning: 0.7.6 release. Zenodo https://doi.org/10.5281/zenodo.3530844 (2020).
Wolf, T. et al. Transformers: State-of-the-Art Natural Language Processing. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (eds Liu, Q. & Schlangen, D.) (2020).
Luo, R., et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinform. 23, bbac409 (2022).
Wu, Y. et al. Google’s neural machine translation system: bridging the gap between human and machine translation. Preprint at https://doi.org/10.48550/arXiv.1609.08144 (2016).
Patel, H. et al. nf-core/rnaseq: nf-core/rnaseq v3.16.0—Fire Ferret. Zenodo https://doi.org/10.5281/zenodo.1400710 (2024).
Patel, H. et al. nf-core/fetchngs: nf-core/fetchngs v2.0—Titanium Tiger. Zenodo https://doi.org/10.5281/zenodo.1400710 (2024).
Ewels, P. A. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 38, 276–278 (2020). (PMID: 3205503110.1038/s41587-020-0439-x)
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017). (PMID: 28091601524181810.1038/ncomms14049)
Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021). (PMID: 33491336782963410.15252/msb.20209620)
Kedzierska, K. Z. et al. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol. 26, 101 (2025). (PMID: 402516851200735010.1186/s13059-025-03574-x)
Franzén, O. et al. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford) 2019, baz046 (2019).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021). (PMID: 34062119823849910.1016/j.cell.2021.04.048)
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000). (PMID: 10802651303741910.1038/75556)
Hänzelmann, S. et al. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013). (PMID: 23323831361832110.1186/1471-2105-14-7)
Barbie, D. A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112 (2009). (PMID: 19847166278333510.1038/nature08460)
Hu, C. et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res. 51, D870–D876 (2023). (PMID: 3630061910.1093/nar/gkac947)
Chen, E. Y., et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013). (PMID: 23586463363706410.1186/1471-2105-14-128)
Narayan, A. et al. Assessing single-cell transcriptomic variability through density-preserving data visualization. Nat. Biotechnol. 39, 765–774 (2021). (PMID: 33462509819581210.1038/s41587-020-00801-7)
Wolf, F. A. et al. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018). (PMID: 29409532580205410.1186/s13059-017-1382-0)
Elmentaite, R. et al. Cells of the human intestinal tract mapped across space and time. Nature 597, 250–255 (2021). (PMID: 34497389842618610.1038/s41586-021-03852-1)
Weitere Informationen
Single-cell sequencing characterizes biological samples at unprecedented scale and detail, but data interpretation remains challenging. Here, we present CellWhisperer, an artificial intelligence (AI) model and software tool for chat-based interrogation of gene expression. We establish a multimodal embedding of transcriptomes and their textual annotations, using contrastive learning on 1 million RNA sequencing profiles with AI-curated descriptions. This embedding informs a large language model that answers user-provided questions about cells and genes in natural-language chats. We benchmark CellWhisperer's performance for zero-shot prediction of cell types and other biological annotations and demonstrate its use for biological discovery in a meta-analysis of human embryonic development. We integrate a CellWhisperer chat box with the CELLxGENE browser, allowing users to interactively explore gene expression through a combined graphical and chat interface. In summary, CellWhisperer leverages large community-scale data repositories to connect transcriptomes and text, thereby enabling interactive exploration of single-cell RNA-sequencing data with natural-language chats.
(© 2025. The Author(s).)
Competing interests: C.B. is a cofounder and scientific advisor of Myllia Biotechnology and Neurolentech. The remaining authors declare no competing interests.