问题回答数据集的实际注释策略

论文标题

问题回答数据集的实际注释策略

Practical Annotation Strategies for Question Answering Datasets

论文作者

Kratzwald, Bernhard, Yue, Xiang, Sun, Huan, Feuerriegel, Stefan

论文摘要

注释问题回答（QA）任务的数据集非常昂贵，因为它需要密集的体力劳动和通常特定于领域的知识。然而，以具有成本效益的方式注释质量检查数据集的策略很少。为了为从业者提供补救措施，我们的目标是制定启发式规则以注释一部分问题，以便在保持内部和室外性能的同时降低注释成本。为此，我们进行了大规模分析，以提出实际建议。首先，我们通过实验证明，更多的培训样本通常只会导致更高的内域测试集，但不能帮助模型推广到看不见的数据集。其次，我们制定了模型引导的注释策略：它提出了一个建议，应注释哪些样品子集。基于QA定制为临床环境的域研究，证明了其有效性。在这里，值得注意的是，只有1.2％的原始培训集的分层子集就可以实现97.7％的性能，就好像完整的数据集被注释了。因此，可以大大减少标签工作。总的来说，当标签预算有限时，我们的工作在实践中满足了需求，因此需要提高注释QA数据集的建议更具成本效益。

Annotating datasets for question answering (QA) tasks is very costly, as it requires intensive manual labor and often domain-specific knowledge. Yet strategies for annotating QA datasets in a cost-effective manner are scarce. To provide a remedy for practitioners, our objective is to develop heuristic rules for annotating a subset of questions, so that the annotation cost is reduced while maintaining both in- and out-of-domain performance. For this, we conduct a large-scale analysis in order to derive practical recommendations. First, we demonstrate experimentally that more training samples contribute often only to a higher in-domain test-set performance, but do not help the model in generalizing to unseen datasets. Second, we develop a model-guided annotation strategy: it makes a recommendation with regard to which subset of samples should be annotated. Its effectiveness is demonstrated in a case study based on domain customization of QA to a clinical setting. Here, remarkably, annotating a stratified subset with only 1.2% of the original training set achieves 97.7% of the performance as if the complete dataset was annotated. Hence, the labeling effort can be reduced immensely. Altogether, our work fulfills a demand in practice when labeling budgets are limited and where thus recommendations are needed for annotating QA datasets more cost-effectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题