Treffer: Utilizing LLM in a multitask model for definition modeling and reverse dictionary tasks.

Title:
Utilizing LLM in a multitask model for definition modeling and reverse dictionary tasks.
Source:
AIP Conference Proceedings; 2025, Vol. 3446 Issue 1, p1-9, 9p
Database:
Complementary Index

Weitere Informationen

The use of vector representations offers several benefits. By converting text into vector representations, model can capture the semantic meaning of words and phrases in a numerical format. This allows to leverage machine learning techniques for tasks like text generation. On the other hand, converting vector back to text representation like the original text is still a challenge. To address this issue, two specific tasks have been identified and will be discussed in this study: definition modeling and reverse dictionary tasks. In this study, we modified the unified model by replacing its transformer blocks with pretrained large language model (LLM), namely GPT-2, aiming to improve performance on both tasks. Experiments were conducted in five languages (English, French, Spanish, Italian, and Russian) using three types of embeddings (SGNS, char-based, and ELECTRA) provided by the CODWOE dataset. The results showed that while GPT-2 excels in general language generation tasks, its performance declined compared to the unified model on reverse dictionary metrics (MSE and cosine similarity) and definition modeling metrics (BLEU and max-BLEU). Additionally, mode collapse was more pronounced in GPT-2, whereas the unified model produced more diverse definitions. This indicates that the GPT-2 architecture, which relies solely on decoder blocks, is less suitable for specific definition-related tasks, which seem to benefit more from an encoder-decoder approach. [ABSTRACT FROM AUTHOR]

Copyright of AIP Conference Proceedings is the property of American Institute of Physics and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)