论文标题
在低资源条件下改进平行语料库过滤的得分组合
Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions
论文作者
论文摘要
本文介绍了我们对WMT20句子过滤任务的提交。我们结合了(1)为每种源语言构建的自定义激光器的分数,(2)构建的分类器,旨在通过语义对齐来区分正面和负面对,以及(3)任务Devkit中包含的原始分数。对于由组织者提供的MBART FINETUNNINing设置,我们的方法比基线相对增长了7%和5%,分别在Pashto和Khmer的测试集上的Sacrebleu得分。
This paper describes our submission to the WMT20 sentence filtering task. We combine scores from (1) a custom LASER built for each source language, (2) a classifier built to distinguish positive and negative pairs by semantic alignment, and (3) the original scores included in the task devkit. For the mBART finetuning setup, provided by the organizers, our method shows 7% and 5% relative improvement over baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.