论文标题
有效地获取多语言模型的注释
On Efficiently Acquiring Annotations for Multilingual Models
论文作者
论文摘要
当要为给定问题提供支持多种语言的任务时,出现了两种方法:培训每个语言的模型在其中均等划分,并对高资源语言进行培训,然后对零拍传输到其余语言。在这项工作中,我们表明,使用单个模型跨多种语言的联合学习策略的性能要比上述替代方案要好得多。我们还证明,积极学习提供了其他补充的好处。我们表明,这种简单的方法使该模型可以通过允许其注释预算仲裁以查询语言不确定的语言来提高数据效率。我们在各种任务集上说明了我们提出的方法的有效性:具有4种语言的分类任务,具有4种语言的序列标记任务以及带有5种语言的依赖关系解析任务。我们提出的方法,虽然简单,大大优于在预算受限的多语种环境中建立模型的其他可行替代方案。
When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by zero-shot transfer to the remaining languages. In this work, we show that the strategy of joint learning across multiple languages using a single model performs substantially better than the aforementioned alternatives. We also demonstrate that active learning provides additional, complementary benefits. We show that this simple approach enables the model to be data efficient by allowing it to arbitrate its annotation budget to query languages it is less certain on. We illustrate the effectiveness of our proposed method on a diverse set of tasks: a classification task with 4 languages, a sequence tagging task with 4 languages and a dependency parsing task with 5 languages. Our proposed method, whilst simple, substantially outperforms the other viable alternatives for building a model in a multilingual setting under constrained budgets.