基于混合CNN的注意力与用户图像行为建模的类别

论文标题

基于混合CNN的注意力与用户图像行为建模的类别

Hybrid CNN Based Attention with Category Prior for User Image Behavior Modeling

论文作者

Chen, Xin, Tang, Qingtao, Hu, Ke, Xu, Yue, Qiu, Shihang, Cheng, Jia, Lei, Jun

论文摘要

事实证明，用户历史行为可用于在线广告系统中单击率（CTR）预测。在中国最大的电子商务平台之一的梅图恩（Meituan）中，通常以其图像显示一个项目，以及用户是否单击该物品通常会受其图像的影响，这意味着用户的图像行为有助于了解用户的视觉偏好并提高CTR预测的准确性。现有的用户图像行为模型通常使用两阶段的体系结构，该体系结构在第一阶段通过现成的卷积神经网络（CNN）提取图像的视觉嵌入，然后共同训练具有这些视觉嵌入和非视觉特征的CTR模型。我们发现，两阶段的体系结构是CTR预测的亚最佳选择。同时，在线广告系统中精确标记的类别包含丰富的视觉先验信息，这可以增强用户图像行为的建模。但是，没有类别的现成的CNN可以提取类别无关的特征，从而限制了CNN的表达能力。为了解决这两个问题，我们提出了一个基于混合CNN的注意模块，以统一用户的图像行为和类别，以进行CTR预测。我们的方法在数十亿个量表中，在在线和离线实验中取得了重大改进。

User historical behaviors are proved useful for Click Through Rate (CTR) prediction in online advertising system. In Meituan, one of the largest e-commerce platform in China, an item is typically displayed with its image and whether a user clicks the item or not is usually influenced by its image, which implies that user's image behaviors are helpful for understanding user's visual preference and improving the accuracy of CTR prediction. Existing user image behavior models typically use a two-stage architecture, which extracts visual embeddings of images through off-the-shelf Convolutional Neural Networks (CNNs) in the first stage, and then jointly trains a CTR model with those visual embeddings and non-visual features. We find that the two-stage architecture is sub-optimal for CTR prediction. Meanwhile, precisely labeled categories in online ad systems contain abundant visual prior information, which can enhance the modeling of user image behaviors. However, off-the-shelf CNNs without category prior may extract category unrelated features, limiting CNN's expression ability. To address the two issues, we propose a hybrid CNN based attention module, unifying user's image behaviors and category prior, for CTR prediction. Our approach achieves significant improvements in both online and offline experiments on a billion scale real serving dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题