通过脱钩的批准，避免在Deep RL中篡改激励措施

论文标题

通过脱钩的批准，避免在Deep RL中篡改激励措施

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

论文作者

Uesato, Jonathan, Kumar, Ramana, Krakovna, Victoria, Everitt, Tom, Ngo, Richard, Legg, Shane

论文摘要

当所有反馈机制都受到代理商的影响时，我们如何设计追求给定目标的代理？标准RL算法采用安全的奖励功能，因此在代理可以篡改奖励生成机制的情况下表现不佳。我们为从可影响的反馈中学习的问题提供了一种原则性的解决方案，该解决方案将批准与反馈收集程序结合在一起。对于自然的腐败功能，脱钩的批准算法在收敛及其本地更新方面都使激励措施保持一致。从经验上讲，它们还扩展到可以篡改的复杂3D环境。

How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent? Standard RL algorithms assume a secure reward function, and can thus perform poorly in settings where agents can tamper with the reward-generating mechanism. We present a principled solution to the problem of learning from influenceable feedback, which combines approval with a decoupled feedback collection procedure. For a natural class of corruption functions, decoupled approval algorithms have aligned incentives both at convergence and for their local updates. Empirically, they also scale to complex 3D environments where tampering is possible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题