论文标题
形式化副作用正规化问题
Formalizing the Problem of Side Effect Regularization
论文作者
论文摘要
AI目标通常很难正确指定。有些方法通过规范AI的副作用来解决此问题:代理必须用不完美指定的代理目标来权衡“他们造成了多少混乱”。我们通过援助游戏框架提出了一个正式的副作用正规化标准。在这些游戏中,代理解决了部分可观察到的马尔可夫决策过程(POMDP),代表了其对其应优化目标功能的不确定性。我们考虑在以后的时间步骤向代理揭示真正目标的设置。我们表明,通过将代理人奖励与代理商实现一系列未来任务的能力进行交易,可以解决此POMDP。我们通过在两个环境环境中的地面真相评估来证明问题形式化的合理性。
AI objectives are often hard to specify properly. Some approaches tackle this problem by regularizing the AI's side effects: Agents must weigh off "how much of a mess they make" with an imperfectly specified proxy objective. We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process (POMDP) representing its uncertainty about the objective function it should optimize. We consider the setting where the true objective is revealed to the agent at a later time step. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks. We empirically demonstrate the reasonableness of our problem formalization via ground-truth evaluation in two gridworld environments.