论文标题
使用言论部分标签改进英语为僧伽罗神经机器翻译
Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag
论文作者
论文摘要
神经机器翻译(NMT)的性能显着取决于可用的平行语料库的大小。由于这一事实,与高资源语言对相比,低资源语言对表现出较低的翻译性能。当对形态丰富的语言进行NMT时,翻译质量进一步降低。即使网络包含大量信息,斯里兰卡的大多数人都无法正确阅读和理解英语。因此,需要将英语内容翻译成本地语言,以在当地人之间共享信息。 Sinhala语言是斯里兰卡的主要语言,并且由于这两种语言之间的句法差异在低资源的限制下,因此很难构建一个可以为Sinhala翻译产生优质英语翻译的NMT系统。因此,在这项研究中,我们探讨了将部分语音标签(POS)标签(POS)标签纳入变压器输入嵌入和位置编码的有效方法,以进一步增强基线英语对辛哈拉神经机器翻译模型的性能。
The performance of Neural Machine Translation (NMT) depends significantly on the size of the available parallel corpus. Due to this fact, low resource language pairs demonstrate low translation performance compared to high resource language pairs. The translation quality further degrades when NMT is performed for morphologically rich languages. Even though the web contains a large amount of information, most people in Sri Lanka are unable to read and understand English properly. Therefore, there is a huge requirement of translating English content to local languages to share information among locals. Sinhala language is the primary language in Sri Lanka and building an NMT system that can produce quality English to Sinhala translations is difficult due to the syntactic divergence between these two languages under low resource constraints. Thus, in this research, we explore effective methods of incorporating Part of Speech (POS) tags to the Transformer input embedding and positional encoding to further enhance the performance of the baseline English to Sinhala neural machine translation model.