Treffer: Navigating the NLP and text-based conversational AI: a survey for problem formulation in Amharic texts.
Weitere Informationen
Natural Language Processing (NLP) is a sub-field of artificial intelligence (AI) that focuses on enabling machines to understand natural languages or engage in natural-like conversations. NLP can potentially revolutionize low-resource language domains, including Amharic, a primary language in Ethiopia that can benefit from NLP progress. However, progress in NLP for linguistically localized applications in Amharic is still limited; as a result, Ethiopia has been behind in exploring and implementing such advancements. This study aims to conduct a systematic review and synthesis of peer-reviewed research studies on NLP and conversational AI with special reference to Amharic to identify research gaps, including limitations and challenges that hinder its advancement. As a primary motivation, this study identified critical gaps related to text-based conversational AI for Amharic. This study employed both qualitative and quantitative approaches for the systematic review, analysis, and synthesis of the extracted facts and findings. The PRISMA guidelines were used to select and filter related and relevant studies using inclusion and exclusion criteria. The repositories selected for this study were Web of Science, Scopus, ScienceDirect, Google Scholar, IEEE Xplore, MDPI, NIH, and the ACM Library. In total, 88 peer-reviewed articles were analyzed. This study identified critical limitations and challenges in the selected studies. This paper reviewed several related studies and found that the scarcity of the Amharic dataset is highlighted in most of the related and allied studies, which can be considered a universal limitation of a dataset. As for specific technical challenges in the existing research contributions, 51% of the studies identified computational resource scarcity, 44% noted the localized linguistic complexities (Amharic language morphology), and 20% mentioned the lack of evaluation metrics and benchmarking standards. Furthermore, 50% of the studies revealed issues in NLP models that show a performance gap between recent models compared with that of traditional models. It also shows the capability difference with the English language. Among the models employed, it was also observed that convolutional neural networks (CNNs) were used in 11% of the studies, followed closely by support vector machines (SVMs) at 10%. On the other hand, the most common applications were sentiment analysis (16%), machine translation (15%), and text processing and classification (14%). The review also revealed that 62% of the studies relied heavily on supervised learning models, with limited exploration of unsupervised or semi-supervised approaches. Dataset-related challenges predominantly involved data accessibility, standardization, data quality, annotation limitations, lack of pre-trained embeddings, dataset size, and dataset imbalances. This systematic review highlights the critical challenges and limitations of existing research contributions to Amharic NLP, including the need for a generalizable novel model, data availability, computational resources, and the complexities of Amharic's rich morphology and syntax. These findings highlight critical gaps in existing Amharic text-based conversational AI and evaluation methodologies. Bridging these crucial gaps can help develop the Amharic language. [ABSTRACT FROM AUTHOR]
Copyright of Discover Data is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)