多任务文档排名和问题回答的细粒度相关性注释

论文标题

多任务文档排名和问题回答的细粒度相关性注释

Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

论文作者

Hofstätter, Sebastian, Zlabinger, Markus, Sertkan, Mete, Schröder, Michael, Hanbury, Allan

论文摘要

有许多现有的检索和问题回答数据集。但是，其中大多数要么专注于排名列表评估或单一候选问题答案。这种鸿沟使得正确评估有关排名文档并为给定查询提供摘要或答案的方法具有挑战性。在这项工作中，我们介绍了FIRA：一个新颖的相关性注释的新颖数据集。我们扩展了TREC 2019深度学习轨道的排名检索注释，并通过对所有相关文档进行通过和单词级别的分级相关注释。我们使用新创建的数据来研究长文档中相关性的分布，以及注释者对文本特定位置的关注。例如，我们评估了最近引入的TKL文档排名模型。我们发现，尽管TKL对长文件显示了最新的检索结果，但它却错过了许多相关段落。

There are many existing retrieval and question answering datasets. However, most of them either focus on ranked list evaluation or single-candidate question answering. This divide makes it challenging to properly evaluate approaches concerned with ranking documents and providing snippets or answers for a given query. In this work, we present FiRA: a novel dataset of Fine-Grained Relevance Annotations. We extend the ranked retrieval annotations of the Deep Learning track of TREC 2019 with passage and word level graded relevance annotations for all relevant documents. We use our newly created data to study the distribution of relevance in long documents, as well as the attention of annotators to specific positions of the text. As an example, we evaluate the recently introduced TKL document ranking model. We find that although TKL exhibits state-of-the-art retrieval results for long documents, it misses many relevant passages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题