WMT 2022的Adam Mickiewicz大学：NER协助和质量意识的神经机器翻译

论文标题

WMT 2022的Adam Mickiewicz大学：NER协助和质量意识的神经机器翻译

Adam Mickiewicz University at WMT 2022: NER-Assisted and Quality-Aware Neural Machine Translation

论文作者

Nowakowski, Artur, Pałka, Gabriela, Guttmann, Kamil, Pokrywka, Mikołaj

论文摘要

本文介绍了亚当·米基维奇大学（Adam Mickiewicz University）（AMU）提交的《 WMT 2022一般MT任务》的踪迹。我们参加了乌克兰$ \ leftrightarrow $捷克翻译指示。这些系统是基于变压器（大）体系结构的四个模型的加权集合。这些模型使用源因素来利用输入中存在的指定实体的信息。合奏中的每个模型仅使用共享任务组织者提供的数据培训。使用嘈杂的反向翻译技术来增加培训语料库。集合中的模型之一是文档级模型，该模型在平行和合成的更长序列上训练。在句子级解码过程中，集合生成了N最佳列表。 n-最佳列表与单个文档级模型生成的n-最佳列表合并，该列表一次翻译了多个句子。最后，使用现有的质量估计模型和最低贝叶斯风险解码来重新列表N-最佳列表，以根据彗星评估指标选择最佳假设。根据自动评估结果，我们的系统在两个翻译方向上排名第一。

This paper presents Adam Mickiewicz University's (AMU) submissions to the constrained track of the WMT 2022 General MT Task. We participated in the Ukrainian $\leftrightarrow$ Czech translation directions. The systems are a weighted ensemble of four models based on the Transformer (big) architecture. The models use source factors to utilize the information about named entities present in the input. Each of the models in the ensemble was trained using only the data provided by the shared task organizers. A noisy back-translation technique was used to augment the training corpora. One of the models in the ensemble is a document-level model, trained on parallel and synthetic longer sequences. During the sentence-level decoding process, the ensemble generated the n-best list. The n-best list was merged with the n-best list generated by a single document-level model which translated multiple sentences at a time. Finally, existing quality estimation models and minimum Bayes risk decoding were used to rerank the n-best list so that the best hypothesis was chosen according to the COMET evaluation metric. According to the automatic evaluation results, our systems rank first in both translation directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题