基于上下文的个性化建议，并带有时间变化的用户兴趣

论文标题

基于上下文的个性化建议，并带有时间变化的用户兴趣

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

论文作者

Xu, Xiao, Dong, Fang, Li, Yanghua, He, Shaojian, Li, Xin

论文摘要

在高度非平稳的环境中研究了上下文匪徒问题，由于用户的兴趣，该环境在各种推荐系统中无处不在。两种具有不相交和混合收益的模型被认为是表征用户对不同项目的偏好随时间变化不同的现象。在不连接的回报模型中，玩手臂的奖励由特定于手臂的偏好向量确定，该偏好向量是分段地板的，具有异步和不同手臂的明显变化。提出了一种适应突然奖励变化的有效学习算法，并提供了理论遗憾分析，以表明在时间长度$ t $中实现了遗憾的倍率缩放。该算法将进一步扩展到更一般的环境，并通过混合回报，在这些算法中，弹奏武器的奖励是由特定于ARM的优先矢量和所有All Arms共享的联合系数向量确定的。经验实验是在现实世界数据集上进行的，以验证在两种情况下针对基线的学习算法的优势。

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length $T$ is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题