论文标题
如何交谈以使AI学习:说明,描述和自主权
How to talk so AI will learn: Instructions, descriptions, and autonomy
论文作者
论文摘要
从我们生命的最早几年开始,人类使用语言来表达我们的信念和欲望。因此,能够与人造代理讨论我们的偏好将实现价值一致性的核心目标。但是今天,我们缺乏解释这种语言使用的计算模型。为了应对这一挑战,我们在上下文的强盗环境中正式从语言中学习,并询问人类如何传达与行为的偏好。我们研究了两种不同类型的语言:$ \ textit {Discortmentions} $,它们提供有关所需策略的信息,以及$ \ textit {Descriptions} $,它们提供有关奖励功能的信息。我们表明,代理人的自治程度决定了哪种语言形式是最佳的:在低自治设置中,指令更好,但是当代理需要独立行动时,描述会更好。然后,我们定义了一种务实的听众代理人,该代理人通过推理$ \ textit {} $表达自己的意思,从而强烈地渗透了说话者的奖励功能。我们通过行为实验验证了我们的模型,表明(1)我们的说话者模型可以预测人类的行为,并且(2)我们的务实听众成功地恢复了人类的奖励功能。最后,我们表明,这种社会学习形式可以与传统强化学习中的遗憾相结合并减少遗憾。我们希望这些见解有助于从$ \ textit {obey} $语言转向从中$ \ textit {Learn} $的代理商的开发代理转变。
From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such language use. To address this challenge, we formalize learning from language in a contextual bandit setting and ask how a human might communicate preferences over behaviors. We study two distinct types of language: $\textit{instructions}$, which provide information about the desired policy, and $\textit{descriptions}$, which provide information about the reward function. We show that the agent's degree of autonomy determines which form of language is optimal: instructions are better in low-autonomy settings, but descriptions are better when the agent will need to act independently. We then define a pragmatic listener agent that robustly infers the speaker's reward function by reasoning about $\textit{how}$ the speaker expresses themselves. We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts human behavior, and (2) our pragmatic listener successfully recovers humans' reward functions. Finally, we show that this form of social learning can integrate with and reduce regret in traditional reinforcement learning. We hope these insights facilitate a shift from developing agents that $\textit{obey}$ language to agents that $\textit{learn}$ from it.