论文标题
通过脱钩的批准,避免在Deep RL中篡改激励措施
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
论文作者
论文摘要
当所有反馈机制都受到代理商的影响时,我们如何设计追求给定目标的代理?标准RL算法采用安全的奖励功能,因此在代理可以篡改奖励生成机制的情况下表现不佳。我们为从可影响的反馈中学习的问题提供了一种原则性的解决方案,该解决方案将批准与反馈收集程序结合在一起。对于自然的腐败功能,脱钩的批准算法在收敛及其本地更新方面都使激励措施保持一致。从经验上讲,它们还扩展到可以篡改的复杂3D环境。
How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent? Standard RL algorithms assume a secure reward function, and can thus perform poorly in settings where agents can tamper with the reward-generating mechanism. We present a principled solution to the problem of learning from influenceable feedback, which combines approval with a decoupled feedback collection procedure. For a natural class of corruption functions, decoupled approval algorithms have aligned incentives both at convergence and for their local updates. Empirically, they also scale to complex 3D environments where tampering is possible.