论文标题

图表增强伯特以查询理解

Graph Enhanced BERT for Query Understanding

论文作者

Li, Juanhui, Ma, Yao, Zeng, Wei, Cheng, Suqi, Tang, Jiliang, Wang, Shuaiqiang, Yin, Dawei

论文摘要

查询理解在探索用户的搜索意图并促进用户找到最期望的信息方面起着关键作用。但是,它本质上是具有挑战性的,因为它需要从简短和模棱两可的查询中捕获语义信息,并且通常需要大量的特定于任务标记的数据。近年来,预训练的语言模型(PLM)已提出各种自然语言处理任务,因为它们可以从大规模语料库中提取一般语义信息。因此,有前所未有的机会采用PLM来查询理解。但是,查询理解的目标与现有的培训策略之间存在差距 - 查询理解的目的是提高搜索性能,而现有策略很少考虑此目标。因此,将它们直接应用于查询理解是次优的。另一方面,搜索日志包含查询和URL之间的用户点击,这些查询和URL提供了有关查询以外的查询的搜索行为信息。因此,在本文中,我们旨在通过探索搜索日志来填补这一空白。特别是,要将搜索日志纳入预训练中,我们首先构建一个查询图,如果节点为查询,则连接两个查询,如果它们导致同一URL上的单击。然后,我们提出了一个新颖的图形增强预训练框架Ge-Bert,该框架可以利用查询内容和查询图。换句话说,Ge-Bert可以捕获语义信息和用户的查询搜索行为信息。各种查询理解任务的广泛实验证明了该框架的有效性。

Query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information. However, it is inherently challenging since it needs to capture semantic information from short and ambiguous queries and often requires massive task-specific labeled data. In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks because they can extract general semantic information from large-scale corpora. Therefore, there are unprecedented opportunities to adopt PLMs for query understanding. However, there is a gap between the goal of query understanding and existing pre-training strategies -- the goal of query understanding is to boost search performance while existing strategies rarely consider this goal. Thus, directly applying them to query understanding is sub-optimal. On the other hand, search logs contain user clicks between queries and urls that provide rich users' search behavioral information on queries beyond their content. Therefore, in this paper, we aim to fill this gap by exploring search logs. In particular, to incorporate search logs into pre-training, we first construct a query graph where nodes are queries and two queries are connected if they lead to clicks on the same urls. Then we propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph. In other words, GE-BERT can capture both the semantic information and the users' search behavioral information of queries. Extensive experiments on various query understanding tasks have demonstrated the effectiveness of the proposed framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源