论文标题
了解BERT等级在蒸馏中
Understanding BERT Rankers Under Distillation
论文作者
论文摘要
在大型语料库中预先训练的BERT等深层语言模型为最先进的信息检索排名系统带来了巨大的性能。在此类模型中嵌入的知识使他们能够在段落和查询之间拾取复杂的匹配信号。但是,推理期间的高计算成本限制了其在现实搜索方案中的部署。在本文中,我们研究了BERT内的搜索知识是否以及如何通过蒸馏转移到较小的排名者中。我们的实验表明,使用适当的蒸馏程序至关重要,在保留最先进的性能的同时,该程序可产生高达9倍的速度。
Deep language models such as BERT pre-trained on large corpus have given a huge performance boost to the state-of-the-art information retrieval ranking systems. Knowledge embedded in such models allows them to pick up complex matching signals between passages and queries. However, the high computation cost during inference limits their deployment in real-world search scenarios. In this paper, we study if and how the knowledge for search within BERT can be transferred to a smaller ranker through distillation. Our experiments demonstrate that it is crucial to use a proper distillation procedure, which produces up to nine times speedup while preserving the state-of-the-art performance.