论文标题
机器翻译中拼字信息的调查
A Survey of Orthographic Information in Machine Translation
论文作者
论文摘要
机器翻译是自然语言处理的应用之一,已在不同的语言中探讨。最近,研究人员开始关注机器翻译的资源贫乏语言和密切相关的语言。这些机器翻译系统的广泛且潜在的问题是拼字惯例的变化,这导致了许多传统方法。用两种不同的拼字图编写的两种语言不容易比较,但是拼字信息也可以用于改善机器翻译系统。本文提供了有关拼字法对资源不足语言的机器翻译影响的研究调查。它从机器翻译以及如何利用拼写信息来改善机器翻译方面引入了资源不足的语言。我们描述了该领域的先前工作,讨论了做出了什么基本假设,并展示了拼字知识如何改善资源不足语言的机器翻译的性能。我们讨论了不同类型的机器翻译,并演示了一种旨在将拼字信息与建立良好的机器翻译方法联系起来的趋势。人们对不同级别的机器翻译级别的认知信息的当前努力以及可以从中汲取的教训给予了极大的关注。此外,在本调查中给出了密切相关语言的多语言神经机器翻译。本文以对机器翻译的前进方式结束,并使用拼字信息信息,重点关注多语言环境和双语词典归纳。
Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable, but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography's influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts of cognates information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction.