蒸馏知识以快速检索基于聊天机器人

论文标题

蒸馏知识以快速检索基于聊天机器人

Distilling Knowledge for Fast Retrieval-based Chat-bots

论文作者

Tahami, Amir Vakili, Ghajar, Kamyar, Shakery, Azadeh

论文摘要

响应检索是神经排名的一个子集，其中模型从给定历史记录的一组候选人中选择合适的响应。基于检索的聊天机器人通常用于寻求对话系统（例如客户支持代理）的信息。为了在对话历史记录和候选响应之间进行成对比较，两种方法很常见：跨编码器对这对，对配对进行全面自我注意，并分别编码双对。前者提供了更好的预测质量，但实用的使用速度太慢。在本文中，我们提出了一种新的跨编码架构，并将知识从该模型转移到使用蒸馏的双构架模型。这有效地可以在推理期间无需任何成本来提高双重编码器的性能。我们在三个响应检索数据集上对此方法进行详细分析。

Response retrieval is a subset of neural ranking in which a model selects a suitable response from a set of candidates given a conversation history. Retrieval-based chat-bots are typically employed in information seeking conversational systems such as customer support agents. In order to make pairwise comparisons between a conversation history and a candidate response, two approaches are common: cross-encoders performing full self-attention over the pair and bi-encoders encoding the pair separately. The former gives better prediction quality but is too slow for practical use. In this paper, we propose a new cross-encoder architecture and transfer knowledge from this model to a bi-encoder model using distillation. This effectively boosts bi-encoder performance at no cost during inference time. We perform a detailed analysis of this approach on three response retrieval datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题