论文标题
Samwalker ++:提供信息丰富的采样策略的建议
SamWalker++: recommendation with informative sampling strategy
论文作者
论文摘要
由于缺乏可靠的负面反馈数据,隐性反馈的建议是一项高度挑战的任务。现有方法通过将所有未观察到的数据视为负面的(不喜欢),但会使这些数据的信心下降,以应对这一挑战。但是,这种治疗方法引起了两个问题:(1)通常手动分配未观察到的数据的置信度,这缺乏灵活性,并且可能在评估用户的偏好时会造成经验偏见。 (2)为了处理大量未观察到的反馈数据,大多数现有方法都依赖于随机推理和数据采样策略。但是,由于用户只知道大数据集中的一小部分项目,因此现有采样器很难选择信息性培训实例,在这些实例中,用户真正不喜欢该项目而不是不知道。 为了解决上述两个问题,我们提出了两种新型建议方法Samwalker和Samwalker ++,它们支持自适应置信度分配和有效的模型学习。 Samwalker使用社交网络感知功能对数据信心进行建模,该功能可以根据用户的社交环境自适应地指定不同数据的权重。但是,在许多推荐系统中可能无法获得社交网络信息,这阻碍了Samwalker的应用。因此,我们进一步提出了Samwalker ++,该++不需要任何侧面信息,并使用构造的伪社会网络建模数据置信度。我们还为我们的Samwalker和Samwalker ++制定了基于随机步行的快速采样策略,以适应性地绘制信息性训练实例,这可以加快梯度估计并减少采样方差。在五个现实世界数据集上进行的广泛实验证明了所提出的Samwalker和Samwalker ++的优越性。
Recommendation from implicit feedback is a highly challenging task due to the lack of reliable negative feedback data. Existing methods address this challenge by treating all the un-observed data as negative (dislike) but downweight the confidence of these data. However, this treatment causes two problems: (1) Confidence weights of the unobserved data are usually assigned manually, which lack flexibility and may create empirical bias on evaluating user's preference. (2) To handle massive volume of the unobserved feedback data, most of the existing methods rely on stochastic inference and data sampling strategies. However, since a user is only aware of a very small fraction of items in a large dataset, it is difficult for existing samplers to select informative training instances in which the user really dislikes the item rather than does not know it. To address the above two problems, we propose two novel recommendation methods SamWalker and SamWalker++ that support both adaptive confidence assignment and efficient model learning. SamWalker models data confidence with a social network-aware function, which can adaptively specify different weights to different data according to users' social contexts. However, the social network information may not be available in many recommender systems, which hinders application of SamWalker. Thus, we further propose SamWalker++, which does not require any side information and models data confidence with a constructed pseudo-social network. We also develop fast random-walk-based sampling strategies for our SamWalker and SamWalker++ to adaptively draw informative training instances, which can speed up gradient estimation and reduce sampling variance. Extensive experiments on five real-world datasets demonstrate the superiority of the proposed SamWalker and SamWalker++.