An Enhanced Neural Machine Translation with Pre-Trained Contextual Encoding Knowledge and Data Augmentation for Low-Resource Khasi Language
Main Article Content
Abstract
This paper explores the potential of neural machine translation (NMT) in developing a translation system for low resource languages. The method utilised in this study builds upon the transformer framework by incorporating substantial enhancements to improve the translation. The modifications include data augmentation on the input sentence, initialization of embedding layer weights with pre-trained embedding and adding a new layer into the encoder block to fuse pre-trained context information with the local context encoding. A Khasi English language pair was considered to carry out translation in order to demonstrate our work. The translation system that is obtained exhibits an encouraging performance in terms of BLEU, METEOR, and ROUGE_L evaluation metrics. Additionally, a comparative analysis is conducted with other existing models, and our proposed model demonstrates superior performance compared to statistical machine translation (SMT) and long short-term memory networks (LSTM). Furthermore, a semantic score was calculated between reference and candidate sentences to facilitate semantic comparisons. The findings indicate that the suggested NMT approach has significant potential as an automated solution for translating low-resource languages.
Article Details
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.