论文标题

Kuairand:一个无偏见的顺序推荐数据集,带有随机曝光的视频

KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos

论文作者

Gao, Chongming, Li, Shijun, Zhang, Yuan, Chen, Jiawei, Li, Biao, Lei, Wenqiang, Jiang, Peng, He, Xiangnan

论文摘要

部署在现实世界应用程序中的推荐系统可能具有固有的暴露偏见,从而导致困扰研究人员的有偏见的记录数据。解决这个棘手问题的一种基本方法是收集用户在随机暴露项目(即失踪数据)上的交互。一些作品要求某些用户评分或选择随机推荐的物品,例如Yahoo!,外套和敞开式。但是,这些数据集的大小太小,要么缺少关键信息,例如唯一的用户ID或用户/项目的功能。在这项工作中,我们提出了Kuairand,这是一个无偏的顺序建议数据集,其中包含从随机曝光的视频中,从视频共享的移动应用程序Kuaishou收集的随机曝光视频中。与现有数据集不同,Kuairand记录了12种用户反馈信号(例如,单击,喜欢和查看时间)在两周内插入的随机曝光视频上。为了促进模型学习,我们进一步收集了用户和项目的丰富功能以及用户的行为历史记录。通过释放此数据集,我们首次可以研究高级辩护大规模推荐方案。同样,凭借其独特的功能,Kuairand可以支持其他各种研究方向,例如交互式建议,长顺序行为建模和多任务学习。该数据集及其新闻将在https://kuairand.com上找到。

Recommender systems deployed in real-world applications can have inherent exposure bias, which leads to the biased logged data plaguing the researchers. A fundamental way to address this thorny problem is to collect users' interactions on randomly expose items, i.e., the missing-at-random data. A few works have asked certain users to rate or select randomly recommended items, e.g., Yahoo!, Coat, and OpenBandit. However, these datasets are either too small in size or lack key information, such as unique user ID or the features of users/items. In this work, we present KuaiRand, an unbiased sequential recommendation dataset containing millions of intervened interactions on randomly exposed videos, collected from the video-sharing mobile App, Kuaishou. Different from existing datasets, KuaiRand records 12 kinds of user feedback signals (e.g., click, like, and view time) on randomly exposed videos inserted in the recommendation feeds in two weeks. To facilitate model learning, we further collect rich features of users and items as well as users' behavior history. By releasing this dataset, we enable the research of advanced debiasing large-scale recommendation scenarios for the first time. Also, with its distinctive features, KuaiRand can support various other research directions such as interactive recommendation, long sequential behavior modeling, and multi-task learning. The dataset and its news will be available at https://kuairand.com.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源