论文标题

基于规则的机器翻译模型中术语和命名实体知识的各个方面,用于资源不足的神经机器翻译方案

Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios

论文作者

Torregrosa, Daniel, Pasricha, Nivranshu, Masoud, Maraim, Chakravarthi, Bharathi Raja, Alonso, Juan, Casas, Noe, Arcan, Mihael

论文摘要

基于规则的计算机翻译是一种机器翻译范式,其中语言知识是由专家以规则形式编码的,该规则将文本从源翻译为目标语言。尽管这种方法可以对系统的产出进行广泛的控制,但正式化所需语言知识的成本比训练基于语料库的系统要高得多,在该系统中,机器学习方法被用来自动学习从示例中进行翻译。在本文中,我们描述了不同的方法来利用基于规则的机器翻译系统中包含的信息来改善基于语料库的信息,即神经机器翻译模型,重点是低资源场景。使用了三种不同类型的信息:形态学信息,命名实体和术语。除了评估系统的一般性能外,我们还系统地分析了在处理目标现象时所提出的方法的性能。我们的结果表明,提出的模型从外部信息中学习的能力有限,并且大多数方法并没有显着改变自动评估的结果,但是我们的初步定性评估表明,在某些情况下,我们系统产生的假设表现出良好的行为,例如保持被动语音的使用。

Rule-based machine translation is a machine translation paradigm where linguistic knowledge is encoded by an expert in the form of rules that translate text from source to target language. While this approach grants extensive control over the output of the system, the cost of formalising the needed linguistic knowledge is much higher than training a corpus-based system, where a machine learning approach is used to automatically learn to translate from examples. In this paper, we describe different approaches to leverage the information contained in rule-based machine translation systems to improve a corpus-based one, namely, a neural machine translation model, with a focus on a low-resource scenario. Three different kinds of information were used: morphological information, named entities and terminology. In addition to evaluating the general performance of the system, we systematically analysed the performance of the proposed approaches when dealing with the targeted phenomena. Our results suggest that the proposed models have limited ability to learn from external information, and most approaches do not significantly alter the results of the automatic evaluation, but our preliminary qualitative evaluation shows that in certain cases the hypothesis generated by our system exhibit favourable behaviour such as keeping the use of passive voice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源