从隐式人类反馈中学习任务学习的移情框架

论文标题

从隐式人类反馈中学习任务学习的移情框架

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

论文作者

Cui, Yuchen, Zhang, Qiping, Allievi, Alessandro, Stone, Peter, Niekum, Scott, Knox, W. Bradley

论文摘要

诸如手势，面部表情和发声之类的反应是人类在互动过程中提供的丰富，自然存在的信息渠道。机器人或其他代理可以利用对这种隐性人类反馈的理解，以无代价对人类的任务绩效提高其任务绩效。这种方法与基于示范，批评或其他需要认真和有意提供的指导的通用代理教学方法形成对比。在本文中，我们首先定义了从隐式人类反馈中学习的总体问题，然后建议通过新颖的数据驱动框架（同理心）解决这个问题。这种两阶段的方法包括（1）将隐含的人类反馈映射到相关的任务统计数据，例如奖励，最佳性和优势；（2）使用这样的映射来学习任务。我们实例化了第一阶段和三阶段评估。为此，我们收集了人面部反应的数据集，而参与者观察代理人执行了针对规定的培训任务的次优政策。我们在此数据上训练一个深层的神经网络，并证明了它的能力（1）从预先记录的人面部反应中推断训练任务中事件的相对奖励排名；（2）使用人面部反应提高训练任务中代理商的政策；（3）转移到一个新的领域，其中评估了机器人操纵轨迹。

Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of such implicit human feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demonstrations, critiques, or other guidance that need to be attentively and intentionally provided. In this paper, we first define the general problem of learning from implicit human feedback and then propose to address this problem through a novel data-driven framework, EMPATHIC. This two-stage method consists of (1) mapping implicit human feedback to relevant task statistics such as reward, optimality, and advantage; and (2) using such a mapping to learn a task. We instantiate the first stage and three second-stage evaluations of the learned mapping. To do so, we collect a dataset of human facial reactions while participants observe an agent execute a sub-optimal policy for a prescribed training task. We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题