Treffer: Generation-Based Few-Shot BioNER via Local Knowledge Index and Dual Prompts.

Title:
Generation-Based Few-Shot BioNER via Local Knowledge Index and Dual Prompts.
Authors:
Li W; School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China., Wang H; School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China. wanghong106@163.com., Li W; School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China., Zhao J; School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China., Sun Y; Department of Computer Science, Virginia Tech, Blacksburg, 24061, USA.
Source:
Interdisciplinary sciences, computational life sciences [Interdiscip Sci] 2025 Dec; Vol. 17 (4), pp. 970-986. Date of Electronic Publication: 2025 May 10.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Springer-Verlag Country of Publication: Germany NLM ID: 101515919 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1867-1462 (Electronic) Linking ISSN: 18671462 NLM ISO Abbreviation: Interdiscip Sci Subsets: MEDLINE
Imprint Name(s):
Original Publication: [Heidelberg] : Springer-Verlag
References:
Devlin J, Chang MW, Lee K et al (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv. https://doi.org/10.48550/arXiv.1810.04805 .
Bikku T, Jarugula J, Kongala L et al (2023) Exploring the effectiveness of BERT for sentiment analysis on large-scale social media data. In: 2023 3rd International Conference on Intelligent Technologies (CONIT), IEEE, pp 1725–1734. https://doi.org/10.1109/CONIT59222.2023.10205600.
Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 32:5754–5764. https://doi.org/10.1016/j.tcs.2020.01.001. (PMID: 10.1016/j.tcs.2020.01.001)
Srinivasu PN, Sirisha U, Sandeep K et al (2024) An interpretable approach with explainable AI for heart stroke prediction. Diagnostics 14(2):128. https://doi.org/10.3390/diagnostics14020128. (PMID: 10.3390/diagnostics140201283824800510813874)
Safranchik E, Luo S, Bach S (2020) Weakly supervised sequence tagging from noisy rules. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 5570–5578. https://doi.org/10.1609/aaai.v34i04.6009.
Peng M, Xing X, Zhang Q et al (2019) Distantly supervised named entity recognition using positive-unlabeled learning. arXiv. https://doi.org/10.18653/v1/P19-1231 .
Ding N, Chen Y, Han X et al (2021) Prompt-learning for fine-grained entity typing. arXiv. http://arxiv.org/abs/2108.10604.
Cui L, Wu Y, Liu J et al (2021) Template-based named entity recognition using BART. arXiv. https://doi.org/10.48550/arXiv.2106.01760.
Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. arXiv. https://doi.org/10.48550/arXiv.2104.08691.
Mayfield J, Tablan V (2000) Term extraction using a hybrid approach. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 550–557. https://doi.org/10.3115/1075216.1075285.
Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol 77, pp 257–286. https://doi.org/10.1109/5.18626.
Berger AL, Della Pietra VJ, Della Pietra SA (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71. https://doi.org/10.2145/2145432.2145535. (PMID: 10.2145/2145432.2145535)
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European Conference on Machine Learning, Lecture Notes in Computer Science, vol 1398, pp 137–142. https://doi.org/10.1007/BFb0026683.
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML), pp 282–289. https://doi.org/10.5555/645530.645554.
Collobert R, Weston J, Bottou L et al (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537. https://doi.org/10.5555/645530.645554. (PMID: 10.5555/645530.645554)
Bikku T (2020) Multi-layered deep learning perceptron approach for health risk prediction. J Big Data 7(1):50. https://doi.org/10.1186/s40537-020-00332-8. (PMID: 10.1186/s40537-020-00332-8)
Cho K, Merriënboer BV, Gulcehre C et al (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv. http://arxiv.org/abs/1406.1078.
Zhao H, Grishman R (2018) Bilingual multi-source transfer of neural named entity recognition models. Comput Linguist 44(3):539–553. https://doi.org/10.1162/coli_a_00318. (PMID: 10.1162/coli_a_00318)
Wang C, Wang H, Zhuang H et al (2020) Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree. J Biomed Inf 111:103583. https://doi.org/10.1016/j.jbi.2020.103583. (PMID: 10.1016/j.jbi.2020.103583)
Xu Y, Sui Z (2018) Biomedical named entity recognition using deep learning and conditional random fields. BMC Bioinform 19(1):1–9. https://doi.org/10.1186/s12859-018-4819-5. (PMID: 10.1186/s12859-018-4819-5)
Yang Z, Salakhutdinov R, Cohen W (2016) Transfer learning for sequence tagging with hierarchical recurrent networks. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp 128–137. https://doi.org/10.18658/P16-1013 .
Bikku T, Malligunta KK, Thota S et al (2024) Improved quantum algorithm: a crucial stepping stone in quantum-powered drug discovery. J Electron Mater 53(7):1–10. https://doi.org/10.1007/s11664-024-11275-7. (PMID: 10.1007/s11664-024-11275-7)
Brown TB, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901. https://doi.org/10.48550/arXiv.2005.14165. (PMID: 10.48550/arXiv.2005.14165)
Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1. Association for Computational Linguistics, pp 2227–2237. https://doi.org/10.18653/v1/N18-1202.
Lee J, Yoon W, Kim S et al (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240. https://doi.org/10.1093/bioinformatics/btz682. (PMID: 10.1093/bioinformatics/btz68231501885)
Zheng Z, Tan Y, Wang H et al (2023) CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction. Brief Bioinform 24(1):bbac566. https://doi.org/10.1093/bib/bbac566. (PMID: 10.1093/bib/bbac56636592051)
Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 3615–3620. https://doi.org/10.18653/v1/D19-1371.
Li BZ, Min S, Iyer S et al (2020) Efficient one-pass end-to-end entity linking for questions. arXiv. http://arxiv.org/abs/2010.02413.
Yu J, Bohnet B, Poesio M (2020) Named entity recognition as dependency parsing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 6470–6476. https://doi.org/10.18653/v1/2020.acl-main.577.
Yamada I, Asai A, Shindo H et al (2020) LUKE: deep contextualized entity representations with entity-aware self-attention. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp 6442–6454. https://doi.org/10.18653/v1/2020.emnlp-main.523.
Wang X, Jiang Y, Bach N et al (2021) Improving named entity recognition by external context retrieving and cooperative learning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol 1. Association for Computational Linguistics, pp 1800–1812. https://doi.org/10.18653/v1/2021.acl-long.142.
Cui L, Zhang Y (2019) Hierarchically-refined label attention network for sequence labeling. arXiv. http://arxiv.org/abs/1908.08676.
Lewis M, Liu Y, Goyal N et al (2019) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv. https://doi.org/10.48550/arXiv.1910.13461.
Yan H, Gui T, Dai J et al (2021) A unified generative framework for various NER subtasks. arXiv. http://arxiv.org/abs/2106.01223.
Petroni F, Rocktäschel T, Lewis P et al (2019) Language models as knowledge bases? In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 2463–2473. https://doi.org/10.18653/v1/D19-1250.
Min S, Lewis M, Hajishirzi H et al (2021) Noisy channel language model prompting for few-shot text classification. arXiv. http://arxiv.org/abs/2108.04106.
Chen Y, Zheng Y, Yang Z (2022) Prompt-based metric learning for few-shot NER. arXiv. http://arxiv.org/abs/2211.04337.
Liu X, Zheng Y, Du Z et al (2021) GPT understands, too. arXiv. http://arxiv.org/abs/2103.10385.
Qin G, Eisner J (2021) Learning how to ask: querying LMS with mixtures of soft prompts. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, pp 5203–5212. https://doi.org/10.18653/v1/2021.naacl-main.410.
Li XL, Liang P (2021) Prefix-tuning: optimizing continuous prompts for generation. arXiv. http://arxiv.org/abs/2101.00190.
Vu T, Lester B, Constant N et al (2022) SPoT: better frozen model adaptation through soft prompt transfer. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, vol 1. Association for Computational Linguistics, pp 5039–5059. https://doi.org/10.18653/v1/2022.acl-long.346.
Li J, Tang T, Nie JY et al (2022) Learning to transfer prompts for text generation. arXiv. http://arxiv.org/abs/2205.01543.
Yang Z, Salakhutdinov R, Cohen W (1995) Automatic construction of language templates. In: Proceedings of the ACL Workshop on Template-based Generation, pp 1–8.
Schick T, Schütze H, Tsochantaridis I (2009) A cloze-style model for information retrieval. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 435–442. https://doi.org/10.1145/1571941.1572040.
Gao T, Huang L, Peng H (2016) Efficient text classification via content transformation into cloze questions. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, pp 1139–1148. https://doi.org/10.1145/2983323.2983811.
Petroni F, Rocktäschel T, Lewis P (2018) Learning to generate knowledge base facts using a few examples. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol 1. Association for Computational Linguistics, pp 282–288. https://doi.org/10.18658/P18-1027.
Sun Y, Wang S, Li Y (2014) Aspect-level sentiment analysis with automatic aspect keyword extraction through deep learning. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp 1064–1070. https://doi.org/10.3115/v1/D14-1108.
Cui Y, Liu J, Liu Y (2021) A lightweight tuning framework for pre-trained language models. arXiv. http://arxiv.org/abs/2109.00720.
Lee H, Yoon W, Kim S (2017) Learning to prescribe effective and safe treatment combinations for multimorbidity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1315–1324. https://doi.org/10.1145/3097983.3098075.
Abad-Segura E, González-Zamar MD, Gámez-Galán J (2023) Examining the managerial and cost control for an optimal healthcare education. Comput Methods Programs Biomed Update 3:100088. https://doi.org/10.1016/j.cmpbup.2022.100088. (PMID: 10.1016/j.cmpbup.2022.100088)
Sun C, Yang Z, Wang L et al (2021) Biomedical named entity recognition using BERT in the machine reading comprehension framework. J Biomed Inf 118:103799. https://doi.org/10.1016/j.jbi.2021.103799. (PMID: 10.1016/j.jbi.2021.103799)
Ma J, Ballesteros M, Doss S (2022) Label semantics for few shot named entity recognition. arXiv. http://arxiv.org/abs/2203.08985.
Zhang T, Kishore V, Wu F et al (2019) BERTScore: evaluating text generation with BERT. arXiv. http://arxiv.org/abs/1904.09675.
Chen X, Li L, Deng S et al (2022) Lightner: a lightweight tuning paradigm for low-resource NER via pluggable prompting. In: Proceedings of the 29th International Conference on Computational Linguistics (COLING). International Committee on Computational Linguistics, pp 2374–2387. https://doi.org/10.48550/arXiv.2109.00720.
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, vol 30, pp 5998–6008.
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol 27, pp 3104–3112. https://doi.org/10.48550/arXiv.1409.3215.
Sun Y, Lu YC, Fu K et al (2022) Detecting anomalous traffic behaviors with seasonal deep Kalman filter graph convolutional neural networks. J King Saud Univ Comput Inf Sci 34(8):4729–4742. (PMID: 10.1016/j.jksuci.2022.05.017)
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in Neural Information Processing Systems, vol 28, pp 2692–2700. https://doi.org/10.48550/arXiv.1506.03134.
Kim JD, Wang Y, Yasunori Y (2013) The genia event extraction shared task, 2013 edition—overview. In: Proceedings of the BioNLP Shared Task 2013 Workshop. Association for Computational Linguistics, pp 8–15. https://doi.org/10.18653/v1/W13-2002.
Collier N, Kim JD (2004) Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications (NLPBA/BioNLP). Association for Computational Linguistics, pp 73–78. https://doi.org/10.18653/v1/W04-2408.
Sang EF, De Meulder F (2003) Introduction to the CONLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. Association for Computational Linguistics, pp 142–147. https://doi.org/10.3115/1073214.1073244.
Doğan RI, Leaman R, Lu Z (2014) NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inf 47:1–10. https://doi.org/10.1016/j.jbi.2014.03.003. (PMID: 10.1016/j.jbi.2014.03.003)
Li J, Sun Y, Johnson RJ et al (2016) BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database J Biol Databases Cur 20:16. https://doi.org/10.1093/database/baw068. (PMID: 10.1093/database/baw068)
Stubbs A, Uzuner Ö (2015) Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus. J Biomed Inf 58S:S20–S29. https://doi.org/10.1016/j.jbi.2015.06.007. (PMID: 10.1016/j.jbi.2015.06.007)
Derczynski L, Nichols E, van Erp M et al (2017) Results of the WNUT2017 shared task on novel and emerging entity recognition. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text. Association for Computational Linguistics, pp 140–147. https://doi.org/10.18653/v1/W17-4418.
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, vol 30. Curran Associates Inc., Long Beach, CA, USA.
Fritzler A, Logacheva V, Kretov M (2019) Few-shot classification in named entity recognition task. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp 993–1000. https://doi.org/10.1145/3306500.3306540.
Paolini G, Athiwaratkun B, Krone J et al (2021) Structured prediction as translation between augmented natural languages. arXiv. http://arxiv.org/abs/2101.05779.
Ma J, Ballesteros M, Doss S et al (2022) Label semantics for few shot named entity recognition. In: Findings of the Association for Computational Linguistics, pp 1956–1971. https://doi.org/10.18653/v1/2022.findings-acl.155.
Yang Y, Katiyar A (2020) Simple and effective few-shot named entity recognition with structured nearest neighbor learning. arXiv. http://arxiv.org/abs/2010.02405.
Lee DH, Kadakia A, Tan K et al (2021) Good examples make a faster learner: simple demonstration-based learning for low-resource NER. arXiv. http://arxiv.org/abs/2010.02405.
Grant Information:
62072290 National Natural Science Foundation of China; 61672329 National Natural Science Foundation of China; SDYKC2022053 Project of Shandong Province Higher Educational Science and Technology Program
Contributed Indexing:
Keywords: Biomedical named entity recognition; External knowledge; Few-shot learning; Prompt tuning; Transfer learning
Entry Date(s):
Date Created: 20250510 Date Completed: 20251202 Latest Revision: 20251202
Update Code:
20251202
DOI:
10.1007/s12539-025-00709-3
PMID:
40347393
Database:
MEDLINE

Weitere Informationen

Few-shot Biomedical Named Entity Recognition (BioNER) presents significant challenges due to limited training data and the presence of nested and discontinuous entities. To tackle these issues, a novel approach GKP-BioNER, Generation-based Few-Shot BioNER via Local Knowledge Index and Dual Prompts, is proposed. It redefines BioNER as a generation task by integrating hard and soft prompts. Specifically, GKP-BioNER constructs a localized knowledge index using a Wikipedia dump, facilitating the retrieval of semantically relevant texts to the original sentence. These texts are then reordered to prioritize the most semantically relevant content to the input data, serving as hard prompts. This helps the model to address challenges demanding domain-specific insights. Simultaneously, GKP-BioNER preserves the integrity of the pre-trained models while introducing learnable parameters as soft prompts to guide the self-attention layer, allowing the model to adapt to the context. Moreover, a soft prompt mechanism is designed to support knowledge transfer across domains. Extensive experiments on five datasets demonstrate that GKP-BioNER significantly outperforms eight state-of-the-art methods. It shows robust performance in low-resource and complex scenarios across various domains, highlighting its strength in knowledge transfer and broad applicability.
(© 2025. International Association of Scientists in the Interdisciplinary Areas.)

Declarations. Conflict of interest: The authors declare that they have no conflict of interest in this work. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Materials availability: Not applicable. Code availability: Code will be made available on request.