迈向文本检索的强大排名

论文标题

迈向文本检索的强大排名

Towards Robust Ranker for Text Retrieval

论文作者

Zhou, Yucheng, Shen, Tao, Geng, Xiubo, Tao, Chongyang, Xu, Can, Long, Guodong, Jiao, Binxing, Jiang, Daxin

论文摘要

排名者在事实上的“检索＆rerank”管道中起着必不可少的作用，但其训练仍然落后 - 从中度的负面因素或/和/和/和作为回猎犬的辅助模块中学习。在这项工作中，我们首先确定了强大的排名者的两个主要障碍，即是由训练有素的猎犬和非理想的否定性引起的固有标签噪声，并为高能力的排名采样。因此，我们提出了多个检索器，因为负发电机改善了排名者的鲁棒性，其中i）涉及广泛的分布量标签噪声，使排名者与每个噪声分布相比，ii）ii）从联合分布中的各种硬质量负面因素相对接近等级器的负分布，从而导致更具挑战性的训练。为了评估我们可靠的排名（称为r $^2 $ anker），我们在各种环境中进行了有关流行段落检索基准测试的各种实验，包括BM25级，全等级，检索器蒸馏等。经验结果验证了我们模型的新型目前的最新效率。

A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker's robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker's negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题