论文标题

HybrIDQA:通过表格和文本数据回答的多跳问题的数据集

HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data

论文作者

Chen, Wenhu, Zha, Hanwen, Chen, Zhiyu, Xiong, Wenhan, Wang, Hong, Wang, William

论文摘要

现有问题回答数据集的重点是仅基于文本或KB/表信息处理均质信息。但是,由于人类知识是通过异质形式分布的,因此仅使用均匀信息可能会导致严重的覆盖问题。要填补空白,我们提出Hybridqa https://github.com/wenhuchen/hybridqa,这是一种新的大规模质疑数据集,需要关于异质信息的推理。每个问题都与Wikipedia表和与表中实体相关联的多个自由形式的COLPORA。这些问题旨在汇总表格信息和文本信息,即缺乏两种形式将使问题无法回答。我们使用三种不同的模型进行测试:1)仅桌子模型。 2)仅文本模型。 3)结合异质信息以找到答案的混合模型。实验结果表明,通过两个基线获得的EM得分低于20 \%,而混合模型可以实现超过40 \%的EM。该差距表明有必要在Hybridqa中汇总异质信息。但是,混合模型的得分仍然远远落后于人类的表现。因此,HybrIDQA可以作为研究问题的问题的挑战性基准。

Existing question answering datasets focus on dealing with homogeneous information, based either only on text or KB/Table information alone. However, as human knowledge is distributed over heterogeneous forms, using homogeneous information alone might lead to severe coverage problems. To fill in the gap, we present HybridQA https://github.com/wenhuchen/HybridQA, a new large-scale question-answering dataset that requires reasoning on heterogeneous information. Each question is aligned with a Wikipedia table and multiple free-form corpora linked with the entities in the table. The questions are designed to aggregate both tabular information and text information, i.e., lack of either form would render the question unanswerable. We test with three different models: 1) a table-only model. 2) text-only model. 3) a hybrid model that combines heterogeneous information to find the answer. The experimental results show that the EM scores obtained by two baselines are below 20\%, while the hybrid model can achieve an EM over 40\%. This gap suggests the necessity to aggregate heterogeneous information in HybridQA. However, the hybrid model's score is still far behind human performance. Hence, HybridQA can serve as a challenging benchmark to study question answering with heterogeneous information.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源