国家编码在加固学习中进行推荐：可重复性研究

论文标题

国家编码在加固学习中进行推荐：可重复性研究

State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study

论文作者

Huang, Jin, Oosterhuis, Harrie, Cetinkaya, Bunyamin, Rood, Thijs, de Rijke, Maarten

论文摘要

加固学习推荐方法（RL4REC）越来越受到关注，因为它们可以快速适应用户反馈。一个典型的RL4REC框架由（1）一个状态编码器组成，用于编码存储用户历史交互的状态，以及（2）采取行动并观察奖励的RL方法。先前的工作比较了在基于现实世界记录的用户数据模拟用户反馈的环境中的四个状态编码器。发现基于注意的状态编码器是达到最高性能的最佳选择。但是，这一发现仅限于参与者 - 批判性方法，四个状态编码器以及未记录DEBIAS的用户数据的评估模拟器。为了应对这些缺点，我们在公开可用的clibias rl4rec Sofa Simulator（2）使用（2）另一种RL方法，（3）更多的状态编码器和（4）不同数据集的公开辩论RL4REC SOFA模拟器中重现并扩展了基于注意力的状态编码器（1）的现有比较。重要的是，我们的实验结果表明，与更多的状态编码器相比，现有的发现并未推广到由不同数据集生成的compias sofa模拟器和基于深的Q-NETWORK（DQN）基于DQN）的方法。

Methods for reinforcement learning for recommendation (RL4Rec) are increasingly receiving attention as they can quickly adapt to user feedback. A typical RL4Rec framework consists of (1) a state encoder to encode the state that stores the users' historical interactions, and (2) an RL method to take actions and observe rewards. Prior work compared four state encoders in an environment where user feedback is simulated based on real-world logged user data. An attention-based state encoder was found to be the optimal choice as it reached the highest performance. However, this finding is limited to the actor-critic method, four state encoders, and evaluation-simulators that do not debias logged user data. In response to these shortcomings, we reproduce and expand on the existing comparison of attention-based state encoders (1) in the publicly available debiased RL4Rec SOFA simulator with (2) a different RL method, (3) more state encoders, and (4) a different dataset. Importantly, our experimental results indicate that existing findings do not generalize to the debiased SOFA simulator generated from a different dataset and a Deep Q-Network (DQN)-based method when compared with more state encoders.

下载PDF全文

下载文献需遵守相关版权规定

论文标题