Treffer: Fine-Tuning the mT5 Model on Bidirectional Myanmar and Tedim Chin Machine Translation System.
Weitere Informationen
Nowadays, machine translation (MT) is a vital tool for overcoming language barriers, especially for underrepresented and low-resource languages. This study explores the effectiveness of the mT5 neural machine translation model in facilitating translation between Myanmar and Tedim Chin, two languages are limited digital resources. To conduct this research, we built a parallel corpus of 26,404 Myanmar-Tedim Chin sentence pairs of the general domain that are written in the Myanmar language. The data were collected from diverse domains and manually translated them into Tedim Chin, resulting in a custom Myanmar-Tedim Chin corpus. A significant challenge in processing Myanmar text is its lack of explicit word boundaries, which necessitates robust segmentation techniques. To address this, we implemented the syllable-level and word-level segmentation methods as part of the preprocessing step. The segmented data were then used to fine-tune the model, and the model's performance was evaluated using BLEU scores and accuracy metrics. Despite Tedim Chin being a low-resource language, the mT5 model achieved promising results by indicating its suitability for translation tasks involving both Myanmar and Tedim Chin. This study highlights the effectiveness of the mT5 model compared with the Transformer model, Helsinki-NLP model, and NLLB-200 model in advancing machine translation for underrepresented languages and provides a foundation for future research in this area. [ABSTRACT FROM AUTHOR]