论文标题

通过目标性别注释来缓解机器翻译中的性别偏见

Mitigating Gender Bias in Machine Translation with Target Gender Annotations

论文作者

Stafanovičs, Artūrs, Bergmanis, Toms, Pinnis, Mārcis

论文摘要

翻译“秘书要求详细信息”时。对于具有语法性别的语言,可能有必要确定主题“秘书”的性别。如果该句子不包含必要的信息,则不总是有可能消除歧义。在这种情况下,机器翻译系统选择了最常见的翻译选项,该选项通常与刻板的翻译相对应,从而可能加剧某些群体和人的偏见和边缘化。我们认为,适当翻译所需的信息不能总是从翻译的句子中推导,甚至可能取决于外部知识。因此,在这项工作中,我们建议将获得必要信息从学习任务中获取必要信息的任务,以便在可用的信息时正确翻译。为此,我们提出了一种训练机器翻译系统的方法,以使用包含有关受试者性别的信息的单词级注释。为了准备培训数据,我们用相应目标语言单词的语法性别信息注释常规源语言单词。当有关受试者性别的信息可用时,使用此类数据来训练机器翻译系统可以减少其对性别刻板印象的依赖。我们对五对语言的实验表明,这允许将Winomt测试的准确性提高到最多25.8个百分点。

When translating "The secretary asked for details." to a language with grammatical gender, it might be necessary to determine the gender of the subject "secretary". If the sentence does not contain the necessary information, it is not always possible to disambiguate. In such cases, machine translation systems select the most common translation option, which often corresponds to the stereotypical translations, thus potentially exacerbating prejudice and marginalisation of certain groups and people. We argue that the information necessary for an adequate translation can not always be deduced from the sentence being translated or even might depend on external knowledge. Therefore, in this work, we propose to decouple the task of acquiring the necessary information from the task of learning to translate correctly when such information is available. To that end, we present a method for training machine translation systems to use word-level annotations containing information about subject's gender. To prepare training data, we annotate regular source language words with grammatical gender information of the corresponding target language words. Using such data to train machine translation systems reduces their reliance on gender stereotypes when information about the subject's gender is available. Our experiments on five language pairs show that this allows improving accuracy on the WinoMT test set by up to 25.8 percentage points.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源