Vom 20.12.2025 bis 11.01.2026 ist die Universitätsbibliothek geschlossen. Ab dem 12.01.2026 gelten wieder die regulären Öffnungszeiten. Ausnahme: Medizinische Hauptbibliothek und Zentralbibliothek sind bereits ab 05.01.2026 wieder geöffnet. Weitere Informationen

Treffer: Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.

Title:
Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.
Authors:
Yang R; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore., Zeng Q; Department of Linguistics, Northwestern University, Evanston, IL, United States., You K; Department of Computer Science, Yale University, New Haven, CT, United States., Qiao Y; Yale School of Public Health, Yale University, New Haven, CT, United States., Huang L; Department of Computer Science, Yale University, New Haven, CT, United States., Hsieh CC; Department of Computer Science, Yale University, New Haven, CT, United States., Rosand B; Department of Computer Science, Yale University, New Haven, CT, United States., Goldwasser J; Department of Computer Science, Yale University, New Haven, CT, United States., Dave A; Yale New Haven Hospital, Yale School of Medicine, Yale University, New Haven, CT, United States., Keenan T; Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, United States., Ke Y; Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore., Hong C; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States., Liu N; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore.; Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore.; Institute of Data Science, National University of Singapore, Singapore, Singapore., Chew E; Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, United States., Radev D; Department of Computer Science, Yale University, New Haven, CT, United States., Lu Z; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States., Xu H; Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States., Chen Q; Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States., Li I; Information Technology Center, University of Tokyo, Kashiwa, Japan.; Smartor LLC, Tokyo, Japan.
Source:
Journal of medical Internet research [J Med Internet Res] 2024 Oct 03; Vol. 26, pp. e60601. Date of Electronic Publication: 2024 Oct 03.
Publication Type:
Journal Article; Evaluation Study
Language:
English
Journal Info:
Publisher: JMIR Publications Country of Publication: Canada NLM ID: 100959882 Publication Model: Electronic Cited Medium: Internet ISSN: 1438-8871 (Electronic) Linking ISSN: 14388871 NLM ISO Abbreviation: J Med Internet Res Subsets: MEDLINE
Imprint Name(s):
Publication: <2011- > : Toronto : JMIR Publications
Original Publication: [Pittsburgh, PA? : s.n., 1999-
Comments:
Update of: ArXiv. 2023 Dec 9:arXiv:2311.16588v2.. (PMID: 41031083)
References:
Sci Data. 2019 May 10;6(1):52. (PMID: 31076572)
AMIA Annu Symp Proc. 2022 Feb 21;2021:438-447. (PMID: 35308962)
Sci Data. 2020 Oct 2;7(1):322. (PMID: 33009402)
IEEE J Biomed Health Inform. 2018 Sep;22(5):1589-1604. (PMID: 29989977)
J Gen Intern Med. 2022 Apr;37(5):1275-1277. (PMID: 35132559)
J Med Internet Res. 2024 Apr 17;26:e48330. (PMID: 38630522)
Sci Data. 2019 Dec 12;6(1):317. (PMID: 31831740)
Health Care Sci. 2023 Jul 24;2(4):255-263. (PMID: 38939520)
J Biomed Inform. 2014 Dec;52:457-67. (PMID: 25016293)
Stud Health Technol Inform. 2019 Aug 21;264:25-29. (PMID: 31437878)
Brief Bioinform. 2023 Nov 22;25(1):. (PMID: 38168838)
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. (PMID: 14681409)
Proc Conf. 2021 Jun;2021:4972-4984. (PMID: 35663507)
NPJ Digit Med. 2024 Aug 10;7(1):209. (PMID: 39127820)
Bioinformatics. 2020 Feb 15;36(4):1234-1240. (PMID: 31501885)
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. (PMID: 34330244)
J Am Med Inform Assoc. 2021 Aug 13;28(9):1892-1899. (PMID: 34157094)
Grant Information:
K99 LM014024 United States LM NLM NIH HHS; UL1 TR001863 United States TR NCATS NIH HHS
Contributed Indexing:
Keywords: deep learning; generative artificial intelligence; healthcare; large language models; machine learning; natural language processing; retrieval-augmented generation
Entry Date(s):
Date Created: 20241003 Date Completed: 20241003 Latest Revision: 20251008
Update Code:
20251008
PubMed Central ID:
PMC11487205
DOI:
10.2196/60601
PMID:
39361955
Database:
MEDLINE

Weitere Informationen

Background: Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings.
Objective: This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.
Methods: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics.
Results: The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5).
Conclusions: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
(©Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024.)