论文标题

MRCLEN:MRC数据集偏置检测工具包

MRCLens: an MRC Dataset Bias Detection Toolkit

论文作者

Zhong, Yifan, Wang, Haohan, Xing, Eric P.

论文摘要

许多最近的神经模型在机器阅读理解中表现出了显着的经验结果,但有时证据表明,有时这些模型利用数据集偏见来预测和无法概括样本外数据。尽管已经提出了许多其他方法来从计算角度(例如新体系结构或培训程序)解决此问题,但我们认为一种使研究人员发现偏见并在较早阶段调整数据或模型的方法将是有益的。因此,我们介绍了MRCLEN,这是一种在用户训练完整模型之前检测偏见是否存在的工具包。为了方便引入工具包,我们还提供了MRC中常见偏见的分类。

Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a method that allows researchers to discover biases, and adjust the data or the models in an earlier stage will be beneficial. Thus, we introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源