推断人类偏好时人类学习

论文标题

推断人类偏好时人类学习

Accounting for Human Learning when Inferring Human Preferences

论文作者

Giles, Harry, Chan, Lawrence

论文摘要

逆增强学习（IRL）是从数据中推断人类偏好的常见技术。标准IRL技术倾向于假设人类的示威者是静止的，那就是他们的政策$π$不会随着时间而变化。在实践中，人类与新颖的环境互动或在新任务上表现良好会改变自己的示威，因为他们更多地了解环境或任务。我们研究了放松这种平稳性假设的后果，特别是通过将人类作为学习建模。令人惊讶的是，我们在一些小例子中发现，这可能会导致比人类静止不动的推论更好。也就是说，通过观察自己正在学习的示威者，一台机器可以推断出比观察一个吵闹的示威者更合理的示威者。此外，我们发现证据表明错误指定会导致不良的推理，这表明对人类学习进行建模很重要，尤其是当人类面临陌生环境时。

Inverse reinforcement learning (IRL) is a common technique for inferring human preferences from data. Standard IRL techniques tend to assume that the human demonstrator is stationary, that is that their policy $π$ doesn't change over time. In practice, humans interacting with a novel environment or performing well on a novel task will change their demonstrations as they learn more about the environment or task. We investigate the consequences of relaxing this assumption of stationarity, in particular by modelling the human as learning. Surprisingly, we find in some small examples that this can lead to better inference than if the human was stationary. That is, by observing a demonstrator who is themselves learning, a machine can infer more than by observing a demonstrator who is noisily rational. In addition, we find evidence that misspecification can lead to poor inference, suggesting that modelling human learning is important, especially when the human is facing an unfamiliar environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题