DE-RRD：推荐系统的知识蒸馏框架

论文标题

DE-RRD：推荐系统的知识蒸馏框架

DE-RRD: A Knowledge Distillation Framework for Recommender System

论文作者

Kang, SeongKu, Hwang, Junyoung, Kweon, Wonbin, Yu, Hwanjo

论文摘要

最近的推荐系统已经开始采用知识蒸馏，这是一种模型压缩技术，将知识从繁琐的模型（教师）蒸馏到紧凑的模型（学生），以减少推理潜伏期，同时保持绩效。最先进的方法仅着重于使学生模型准确地模仿教师模型的预测。他们有一个限制，因为预测结果并未完全揭示老师的知识。在本文中，我们为推荐系统（称为de-rrd）提出了一个新颖的知识蒸馏框架，该框架使学生模型能够从教师模型中编码的潜在知识以及教师的预测中学习。具体而言，DE-RRD由两种方法组成：1）蒸馏专家（DE）直接从教师模型中传递潜在知识。 De利用“专家”和一种新颖的专家选择策略，可有效地将庞大的教师知识提炼给能力有限的学生。 2）轻松的排名蒸馏（RRD）通过考虑项目之间放松的排名顺序，从教师的预测中转移了知识。我们的广泛实验表明，DE-RRD的表现优于最先进的竞争对手，并且在推理时间更快的情况下，与教师模型的表现相当甚至更好。

Recent recommender systems have started to employ knowledge distillation, which is a model compression technique distilling knowledge from a cumbersome model (teacher) to a compact model (student), to reduce inference latency while maintaining performance. The state-of-the-art methods have only focused on making the student model accurately imitate the predictions of the teacher model. They have a limitation in that the prediction results incompletely reveal the teacher's knowledge. In this paper, we propose a novel knowledge distillation framework for recommender system, called DE-RRD, which enables the student model to learn from the latent knowledge encoded in the teacher model as well as from the teacher's predictions. Concretely, DE-RRD consists of two methods: 1) Distillation Experts (DE) that directly transfers the latent knowledge from the teacher model. DE exploits "experts" and a novel expert selection strategy for effectively distilling the vast teacher's knowledge to the student with limited capacity. 2) Relaxed Ranking Distillation (RRD) that transfers the knowledge revealed from the teacher's prediction with consideration of the relaxed ranking orders among items. Our extensive experiments show that DE-RRD outperforms the state-of-the-art competitors and achieves comparable or even better performance to that of the teacher model with faster inference time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题