Kobest：韩国对重大任务的平衡评估

论文标题

Kobest：韩国对重大任务的平衡评估

KOBEST: Korean Balanced Evaluation of Significant Tasks

论文作者

Kim, Dohyeong, Jang, Myeongjun, Kwon, Deuk Sin, Davis, Eric

论文摘要

良好的基准测试在刺激自然语言处理（NLP）领域的进步中起着至关重要的作用，因为它允许对各种模型的客观和精确评估。随着现代语言模型（LMS）变得越来越详尽和复杂，已经提出了需要语言知识和推理的更困难的基准。但是，这些基准中的大多数仅支持英语，而为其他低资源语言构建基准是必要的。为此，我们提出了一个新的基准，称为韩国对重要任务的韩国平衡评估（Kobest），该评估由五个下游任务组成。专业的韩国语言学家设计了需要高级韩国语言知识的任务。此外，我们的数据纯粹由人类注释，并进行了彻底的审查，以确保高数据质量。我们还提供基线模型和人类绩效结果。我们的数据集可在HuggingFace上找到。

A well-formulated benchmark plays a critical role in spurring advancements in the natural language processing (NLP) field, as it allows objective and precise evaluation of diverse models. As modern language models (LMs) have become more elaborate and sophisticated, more difficult benchmarks that require linguistic knowledge and reasoning have been proposed. However, most of these benchmarks only support English, and great effort is necessary to construct benchmarks for other low resource languages. To this end, we propose a new benchmark named Korean balanced evaluation of significant tasks (KoBEST), which consists of five Korean-language downstream tasks. Professional Korean linguists designed the tasks that require advanced Korean linguistic knowledge. Moreover, our data is purely annotated by humans and thoroughly reviewed to guarantee high data quality. We also provide baseline models and human performance results. Our dataset is available on the Huggingface.

下载PDF全文

下载文献需遵守相关版权规定

论文标题