论文标题
迈向文本检索的强大排名
Towards Robust Ranker for Text Retrieval
论文作者
论文摘要
排名者在事实上的“检索&rerank”管道中起着必不可少的作用,但其训练仍然落后 - 从中度的负面因素或/和/和/和作为回猎犬的辅助模块中学习。在这项工作中,我们首先确定了强大的排名者的两个主要障碍,即是由训练有素的猎犬和非理想的否定性引起的固有标签噪声,并为高能力的排名采样。因此,我们提出了多个检索器,因为负发电机改善了排名者的鲁棒性,其中i)涉及广泛的分布量标签噪声,使排名者与每个噪声分布相比,ii)ii)从联合分布中的各种硬质量负面因素相对接近等级器的负分布,从而导致更具挑战性的训练。为了评估我们可靠的排名(称为r $^2 $ anker),我们在各种环境中进行了有关流行段落检索基准测试的各种实验,包括BM25级,全等级,检索器蒸馏等。经验结果验证了我们模型的新型目前的最新效率。
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker's robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker's negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.