论文标题

Klej:综合基准,用于波兰语言理解

KLEJ: Comprehensive Benchmark for Polish Language Understanding

论文作者

Rybak, Piotr, Mroczkowski, Robert, Tracz, Janusz, Gawlik, Ireneusz

论文摘要

近年来,一系列基于变压器的模型解锁了一般自然语言理解(NLU)任务的重大改进。如果没有一般的NLU基准,就不可能进行这样的快速研究,从而可以对所提出的方法进行公平的比较。但是,此类基准仅适用于少数语言。为了减轻这个问题,我们在线排行榜伴随着一个全面的多任务基准,以了解波兰语言的理解。它由一组各种任务组成,这些任务是从现有数据集中采用的,用于指定的实体识别,提问,文本要求等。我们还为电子商务领域介绍了一项新的情感分析任务,名为Allegro评论(AR)。为了确保共同的评估方案并促进对不同NLU任务推广的模型,基准包括来自不同域和应用程序的数据集。此外,我们发布了基于变压器的模型赫伯特(Herbert),该模型专门针对波兰语言进行了培训,该模型具有最佳的平均表现,并获得了九项任务中三个的最佳结果。最后,我们提供了广泛的评估,包括几种标准基线和最近提出的基于多语言变压器的模型。

In recent years, a series of Transformer-based models unlocked major improvements in general natural language understanding (NLU) tasks. Such a fast pace of research would not be possible without general NLU benchmarks, which allow for a fair comparison of the proposed methods. However, such benchmarks are available only for a handful of languages. To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing datasets for named entity recognition, question-answering, textual entailment, and others. We also introduce a new sentiment analysis task for the e-commerce domain, named Allegro Reviews (AR). To ensure a common evaluation scheme and promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and applications. Additionally, we release HerBERT, a Transformer-based model trained specifically for the Polish language, which has the best average performance and obtains the best results for three out of nine tasks. Finally, we provide an extensive evaluation, including several standard baselines and recently proposed, multilingual Transformer-based models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源