连续推荐系统的通用网络压缩框架

论文标题

连续推荐系统的通用网络压缩框架

A Generic Network Compression Framework for Sequential Recommender Systems

论文作者

Sun, Yang, Yuan, Fajie, Yang, Min, Wei, Guoao, Zhao, Zhou, Liu, Duo

论文摘要

顺序推荐系统（SRS）已成为捕获用户动态兴趣并产生高质量建议的关键技术。当前最新的顺序推荐模型通常基于三明治结构的深神经网络，其中一个或多个中间（隐藏）层位于输入嵌入层和输出软磁层之间。通常，这些模型需要大量参数（例如使用较大的嵌入维度或深层网络体系结构）来获得其最佳性能。尽管有效，但在某个时候，对于资源构成设备中的模型部署而言，进一步增加的模型大小可能更难，从而导致更长的响应时间和更大的内存足迹。为了解决问题，我们提出了一个被称为CPREC的压缩顺序推荐框架，其中采用了两种通用模型收缩技术。具体而言，我们首先提出了一个宽阔的自适应分解，以利用SRS中的项目遵守长尾分布的事实来近似输入和软效果矩阵。为了减少中层层的参数，我们引入了三个层的参数共享方案。我们使用深度卷积神经网络实例化CPREC，并考虑了建议的精度和效率。通过广泛的消融研究，我们证明了拟议的CPREC可以在现实世界中的SRS数据集中达到高达4 $ \ sim $ 8倍的压缩率。同时，CPREC在训练\推理期间的速度更快，并且在大多数情况下，其表现优于其未压缩的速度。

Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations. Current state-of-the-art sequential recommender models are typically based on a sandwich-structured deep neural network, where one or more middle (hidden) layers are placed between the input embedding layer and output softmax layer. In general, these models require a large number of parameters (such as using a large embedding dimension or a deep network architecture) to obtain their optimal performance. Despite the effectiveness, at some point, further increasing model size may be harder for model deployment in resource-constraint devices, resulting in longer responding time and larger memory footprint. To resolve the issues, we propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed. Specifically, we first propose a block-wise adaptive decomposition to approximate the input and softmax matrices by exploiting the fact that items in SRS obey a long-tailed distribution. To reduce the parameters of the middle layers, we introduce three layer-wise parameter sharing schemes. We instantiate CpRec using deep convolutional neural network with dilated kernels given consideration to both recommendation accuracy and efficiency. By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$\sim$8 times compression rates in real-world SRS datasets. Meanwhile, CpRec is faster during training\inference, and in most cases outperforms its uncompressed counterpart.

下载PDF全文

下载文献需遵守相关版权规定

论文标题