CAN：点击率预测的功能共同行动

论文标题

CAN：点击率预测的功能共同行动

CAN: Feature Co-Action for Click-Through Rate Prediction

论文作者

Bian, Weijie, Wu, Kailun, Ren, Lejian, Pi, Qi, Zhang, Yujing, Xiao, Can, Sheng, Xiang-Rong, Zhu, Yong-Nan, Chan, Zhangming, Mou, Na, Luo, Xinchen, Xiang, Shiming, Zhou, Guorui, Zhu, Xiaoqiang, Deng, Hongbo

论文摘要

特征互动已被认为是机器学习中的重要问题，这对于点击率（CTR）预测任务也非常重要。近年来，深度神经网络（DNNS）可以自动从原始稀疏特征中学习隐式非线性相互作用，因此已被广泛用于工业CTR预测任务。但是，在DNN中学习的隐式特征相互作用无法完全保留原始特征相互作用（例如笛卡尔产品）的完整表示能力，而不会损失。例如，由于新功能的明确笛卡尔产品表示形式可以胜过以前的隐式特征交互模型，包括基于分数计算机（FM）的模型及其变化，因此可以简单地尝试学习特征A和功能B <A，B>的组合。在本文中，我们提出了一个合作网络（CAN），以近似显式的成对特征交互，而无需引入太多其他参数。更具体地说，给出功能A及其相关特征B，其特征交互是通过学习两组参数来建模的：1）特征的嵌入a，以及2）多层感知器（MLP）代表特征B代表特征B.近似特征交互。可以通过将功能a的特征网络与MLP网络进行嵌入来获得。适合复杂特征交互的功能非常强大。公共和工业数据集的实验结果表明，可以胜过最先进的CTR模型和笛卡尔产品方法。此外，可以在阿里巴巴的展示广告系统中部署，获得了CTR的12 \％改进，每毫米收入（RPM）获得了8％，这是对业务的一个很好的改进。

Feature interaction has been recognized as an important problem in machine learning, which is also very essential for click-through rate (CTR) prediction tasks. In recent years, Deep Neural Networks (DNNs) can automatically learn implicit nonlinear interactions from original sparse features, and therefore have been widely used in industrial CTR prediction tasks. However, the implicit feature interactions learned in DNNs cannot fully retain the complete representation capacity of the original and empirical feature interactions (e.g., cartesian product) without loss. For example, a simple attempt to learn the combination of feature A and feature B <A, B> as the explicit cartesian product representation of new features can outperform previous implicit feature interaction models including factorization machine (FM)-based models and their variations. In this paper, we propose a Co-Action Network (CAN) to approximate the explicit pairwise feature interactions without introducing too many additional parameters. More specifically, giving feature A and its associated feature B, their feature interaction is modeled by learning two sets of parameters: 1) the embedding of feature A, and 2) a Multi-Layer Perceptron (MLP) to represent feature B. The approximated feature interaction can be obtained by passing the embedding of feature A through the MLP network of feature B. We refer to such pairwise feature interaction as feature co-action, and such a Co-Action Network unit can provide a very powerful capacity to fitting complex feature interactions. Experimental results on public and industrial datasets show that CAN outperforms state-of-the-art CTR models and the cartesian product method. Moreover, CAN has been deployed in the display advertisement system in Alibaba, obtaining 12\% improvement on CTR and 8\% on Revenue Per Mille (RPM), which is a great improvement to the business.

下载PDF全文

下载文献需遵守相关版权规定

论文标题