论文标题

网络问题回答半结构数据数据的图表

A Graph Representation of Semi-structured Data for Web Question Answering

论文作者

Zhang, Xingyao, Shou, Linjun, Pei, Jian, Gong, Ming, Wen, Lijie, Jiang, Daxin

论文摘要

网络上的大量半结构化数据,例如基于HTML的表和列表,为商业搜索引擎提供了一个丰富的问答信息源(QA)。与Web文档中的纯文本段落不同,Web表和列表具有固有的结构,这些结构在表和列表中的各个元素之间具有语义相关性。许多现有研究将表和列表视为具有文本文本的平面文档,并且不能充分利用隐藏在结构中的语义信息。在本文中,我们根据半结构化数据及其关系的组件进行系统分类,提出了一个新颖的Web表和列表列表。我们还在QA任务的图形模型上开发了预训练和推理技术。从商用引擎收集的几个实际数据集上进行了广泛的实验,验证了我们方法的有效性。我们的方法将F1得分提高了3.90分,比最先进的基线提高了。

The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源