论文标题

FQUAD:法语问题回答数据集

FQuAD: French Question Answering Dataset

论文作者

d'Hoffschmidt, Martin, Belblidia, Wacim, Brendlé, Tom, Heinrich, Quentin, Vidal, Maxime

论文摘要

语言建模领域的最新进展改善了许多自然语言处理任务的最新结果。其中,阅读理解在过去几年中取得了重大进展。但是,大多数结果以英语报告,因为标记为其他语言(例如法语)的资源仍然很少。在目前的工作中,我们介绍了法国问题回答数据集(FQUAD)。 Fquad是一组Wikipedia文章中的问题和答案的法语本地阅读理解数据集,其中包含25,000多个版本的25,000多个样本,而1.1版本的60,000多个样本组成。我们训练一个基线模型,该模型的F1得分为92.2,精确的匹配比为82.1。为了跟踪法国问题回答模型的进度,我们提出了一个Leader-Board,并在https://illuin-tech.gith.github.io/fquad-explorer/上免费提供了数据集的1.0版本。

Recent advances in the field of language modeling have improved state-of-the-art results on many Natural Language Processing tasks. Among them, Reading Comprehension has made significant progress over the past few years. However, most results are reported in English since labeled resources available in other languages, such as French, remain scarce. In the present work, we introduce the French Question Answering Dataset (FQuAD). FQuAD is a French Native Reading Comprehension dataset of questions and answers on a set of Wikipedia articles that consists of 25,000+ samples for the 1.0 version and 60,000+ samples for the 1.1 version. We train a baseline model which achieves an F1 score of 92.2 and an exact match ratio of 82.1 on the test set. In order to track the progress of French Question Answering models we propose a leader-board and we have made the 1.0 version of our dataset freely available at https://illuin-tech.github.io/FQuAD-explorer/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源