Realab：篡改的嵌入式视角

论文标题

Realab：篡改的嵌入式视角

REALab: An Embedded Perspective on Tampering

论文作者

Kumar, Ramana, Uesato, Jonathan, Ngo, Richard, Everitt, Tom, Krakovna, Victoria, Legg, Shane

论文摘要

本文介绍了Realab，这是一个嵌入式代理研究的平台（RL）。 Realab旨在建模RL现实部署中可能出现的篡改问题的结构。 RL和模拟环境的标准马尔可夫决策过程（MDP）公式镜像MDP结构假设安全访问反馈（例如，奖励）。这在嵌入代理并可能破坏产生反馈的过程的环境中可能是不现实的（例如，人类主管或实施的奖励功能）。我们描述了替代性损坏的反馈MDP公式和RealAb环境平台，这两者都避免了安全的反馈假设。我们希望Realab的设计为篡改问题提供了有用的观点，并且该平台可以作为RL代理设计中篡改激励措施的单位测试。

This paper describes REALab, a platform for embedded agency research in reinforcement learning (RL). REALab is designed to model the structure of tampering problems that may arise in real-world deployments of RL. Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e.g., rewards). This may be unrealistic in settings where agents are embedded and can corrupt the processes producing feedback (e.g., human supervisors, or an implemented reward function). We describe an alternative Corrupt Feedback MDP formulation and the REALab environment platform, which both avoid the secure feedback assumption. We hope the design of REALab provides a useful perspective on tampering problems, and that the platform may serve as a unit test for the presence of tampering incentives in RL agent designs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题