论文标题

reclor:阅读理解数据集需要逻辑推理

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

论文作者

Yu, Weihao, Jiang, Zihang, Dong, Yanfei, Feng, Jiashi

论文摘要

最近,强大的预训练的语言模型在大多数流行的数据集中都取得了出色的性能。现在是时候引入更具挑战性的数据集来将该领域的开发推向更全面的文本推理。在本文中,我们介绍了一个新的阅读理解数据集,需要从标准化的研究生入学检查中提取的逻辑推理(RECLOR)。正如早期研究所表明的那样,人类注销的数据集通常包含偏见,这些数据集通常被模型利用,以实现高精度而不真正理解文本。为了全面评估模型在Reclor上的逻辑推理能力,我们建议识别有偏见的数据点并将其分为易于设置,而其余的则为硬集。经验结果表明,最先进的模型具有出色的能力,可以捕获数据集中包含的偏差,并在易于设置上精确。但是,他们在艰难的情况下挣扎,其性能较差,距离随机猜测的表现不佳,这表明需要进行更多的研究,以增强当前模型的逻辑推理能力。

Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源