推荐革命会产生可靠的注释吗？对DOCRED中缺失实例的分析

论文标题

推荐革命会产生可靠的注释吗？对DOCRED中缺失实例的分析

Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED

论文作者

Huang, Quzhe, Hao, Shibo, Ye, Yuan, Zhu, Shengqi, Feng, Yansong, Zhao, Dongyan

论文摘要

Docred是一个广泛使用的数据集，用于文档级别的关系提取。在大规模注释中，采用了\ textit {推荐revise}方案来减少工作量。在此计划中，向注释者提供了远处监督的候选关系实例，然后根据建议手动补充并删除关系事实。但是，当将DOCR与从头开始重新标记的子集进行比较时，我们发现该方案会导致大量的假阴性样本以及对流行实体和关系的明显偏见。此外，我们观察到，在DOCRED上训练的模型在我们的重新标记数据集中的召回率很低，并且在培训数据中继承了相同的偏见。通过分析注释者的行为，我们找出了上述问题的根本原因：该方案实际上阻止注释者在修订阶段补充足够的实例。我们呼吁将来的研究在设计新模型和注释方案时考虑推荐革命计划的问题。重新标记的数据集以\ url {https://github.com/andrewzhe/revisit-docred}发布，以作为文档RE模型的更可靠的测试集。

DocRED is a widely used dataset for document-level relation extraction. In the large-scale annotation, a \textit{recommend-revise} scheme is adopted to reduce the workload. Within this scheme, annotators are provided with candidate relation instances from distant supervision, and they then manually supplement and remove relational facts based on the recommendations. However, when comparing DocRED with a subset relabeled from scratch, we find that this scheme results in a considerable amount of false negative samples and an obvious bias towards popular entities and relations. Furthermore, we observe that the models trained on DocRED have low recall on our relabeled dataset and inherit the same bias in the training data. Through the analysis of annotators' behaviors, we figure out the underlying reason for the problems above: the scheme actually discourages annotators from supplementing adequate instances in the revision phase. We appeal to future research to take into consideration the issues with the recommend-revise scheme when designing new models and annotation schemes. The relabeled dataset is released at \url{https://github.com/AndrewZhe/Revisit-DocRED}, to serve as a more reliable test set of document RE models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题