论文标题
通过元学习的神经协作过滤匪徒
Neural Collaborative Filtering Bandits via Meta Learning
论文作者
论文摘要
上下文多军匪徒提供了强大的工具来解决决策中的剥削 - 探索困境,并在个性化建议中直接应用。实际上,用户之间的协作效果具有改善建议的巨大潜力。在本文中,我们通过探索“神经协作过滤匪徒”来介绍和研究问题,其中奖励可以是非线性函数,并且会动态地形成不同的特定内容。为了解决这个问题,灵感来自元学习的启发,我们提出了Meta-ban(Meta-Bandits),其中元学习者被设计为代表并迅速适应动态群体以及基于UCB的探索策略。此外,我们分析了Meta-Ban可以实现$ \ Mathcal {O}(\ sqrt {t \ log t})$的遗憾,改善了乘法因子$ \ sqrt {\ sqrt {\ log t} $,而不是先进的相关工作。最后,我们进行了广泛的实验,表明元键明显优于六个强基础。
Contextual multi-armed bandits provide powerful tools to solve the exploitation-exploration dilemma in decision making, with direct applications in the personalized recommendation. In fact, collaborative effects among users carry the significant potential to improve the recommendation. In this paper, we introduce and study the problem by exploring `Neural Collaborative Filtering Bandits', where the rewards can be non-linear functions and groups are formed dynamically given different specific contents. To solve this problem, inspired by meta-learning, we propose Meta-Ban (meta-bandits), where a meta-learner is designed to represent and rapidly adapt to dynamic groups, along with a UCB-based exploration strategy. Furthermore, we analyze that Meta-Ban can achieve the regret bound of $\mathcal{O}(\sqrt{T \log T})$, improving a multiplicative factor $\sqrt{\log T}$ over state-of-the-art related works. In the end, we conduct extensive experiments showing that Meta-Ban significantly outperforms six strong baselines.