关于未知数的悲观主义激发了保守主义

论文标题

关于未知数的悲观主义激发了保守主义

Pessimism About Unknown Unknowns Inspires Conservatism

论文作者

Cohen, Michael K., Hutter, Marcus

论文摘要

如果我们可以定义所有不良结果的集合，我们就可以硬编码避免它们的代理。但是，在足够复杂的环境中，这是不可行的。我们不知道文献中有任何通用方法避免了新型故障模式。在此激励的基础上，我们定义了一个理想化的贝叶斯强化学习者，该学习者遵循了一项政策，该政策最大程度地提高了一套世界模型的最糟糕的预期奖励。我们称这种代理的悲观，因为它优化了假设最坏的情况。标量参数通过更改所考虑的一组世界模型的大小来调整代理的悲观。我们的第一个主要贡献是：给定对代理模型类别的假设，一个充分的悲观代理不会引起“前所未有的事件”，概率为$ 1-δ$，无论设计师是否知道如何精确地指定他们所关心的这些先例。由于悲观主义不鼓励探索在每个时间步长，因此代理商可能会推迟到导师，他可能是人类或我们希望改进的一些已知安全政策。我们的另一个主要贡献是，代理商的政策价值方法至少是导师的价值方法，而将推迟给导师的可能性则为0。在高风险的环境中，我们可能会希望先进的人工代理人谨慎地追求目标，即使允许代理人允许专员的范围，这是一个非平凡的问题；我们提出了一个正式的解决方案。

If we could define the set of all bad outcomes, we could hard-code an agent which avoids them; however, in sufficiently complex environments, this is infeasible. We do not know of any general-purpose approaches in the literature to avoiding novel failure modes. Motivated by this, we define an idealized Bayesian reinforcement learner which follows a policy that maximizes the worst-case expected reward over a set of world-models. We call this agent pessimistic, since it optimizes assuming the worst case. A scalar parameter tunes the agent's pessimism by changing the size of the set of world-models taken into account. Our first main contribution is: given an assumption about the agent's model class, a sufficiently pessimistic agent does not cause "unprecedented events" with probability $1-δ$, whether or not designers know how to precisely specify those precedents they are concerned with. Since pessimism discourages exploration, at each timestep, the agent may defer to a mentor, who may be a human or some known-safe policy we would like to improve. Our other main contribution is that the agent's policy's value approaches at least that of the mentor, while the probability of deferring to the mentor goes to 0. In high-stakes environments, we might like advanced artificial agents to pursue goals cautiously, which is a non-trivial problem even if the agent were allowed arbitrary computing power; we present a formal solution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题