有效地指导模仿学习者与人类目光

论文标题

有效地指导模仿学习者与人类目光

Efficiently Guiding Imitation Learning Agents with Human Gaze

论文作者

Saran, Akanksha, Zhang, Ruohan, Short, Elaine Schaertl, Niekum, Scott

论文摘要

人类的凝视是人类任务示范中的意图揭示信号。在这项工作中，我们使用人类示威者的目光提示来增强通过三种流行的模仿学习方法训练的代理的性能 - 行为克隆（BC），观察（BCO）的行为克隆（BCO）和轨迹排名的奖励外推（T-Rex）。基于强化学习剂和人类凝视的注意力之间的相似性，我们提出了一种新颖的方法，以计算有效的方式利用凝视数据，作为辅助损失函数的一部分，该方法指导网络在人类凝视固定的图像区域中具有较高的激活。这项工作是通过辅助注视数据来扩大任何现有的卷积模仿学习者培训的一步。我们的基于辅助覆盖范围的凝视损失（CGL）指导学习对更好的奖励功能或政策的学习，而无需添加任何其他可学习的参数，而无需在测试时需要凝视数据。我们发现，我们提出的方法可将卑诗省的表现提高95％，BCO的343％，T-Rex的绩效平均为390％，平均为20种不同的Atari游戏。我们还发现，与先前的最新模仿学习方法相比，人类凝视（Agil）的帮助，我们的方法可实现更好的性能，并且在学习方面更有效率，较少的示范表现。我们进一步以显着图可视化方法来解释经过训练的CGL代理，以解释其性能。最后，我们表明CGL可以帮助减轻模仿学习中众所周知的因果混乱问题。

Human gaze is known to be an intention-revealing signal in human demonstrations of tasks. In this work, we use gaze cues from human demonstrators to enhance the performance of agents trained via three popular imitation learning methods -- behavioral cloning (BC), behavioral cloning from observation (BCO), and Trajectory-ranked Reward EXtrapolation (T-REX). Based on similarities between the attention of reinforcement learning agents and human gaze, we propose a novel approach for utilizing gaze data in a computationally efficient manner, as part of an auxiliary loss function, which guides a network to have higher activations in image regions where the human's gaze fixated. This work is a step towards augmenting any existing convolutional imitation learning agent's training with auxiliary gaze data. Our auxiliary coverage-based gaze loss (CGL) guides learning toward a better reward function or policy, without adding any additional learnable parameters and without requiring gaze data at test time. We find that our proposed approach improves the performance by 95% for BC, 343% for BCO, and 390% for T-REX, averaged over 20 different Atari games. We also find that compared to a prior state-of-the-art imitation learning method assisted by human gaze (AGIL), our method achieves better performance, and is more efficient in terms of learning with fewer demonstrations. We further interpret trained CGL agents with a saliency map visualization method to explain their performance. At last, we show that CGL can help alleviate a well-known causal confusion problem in imitation learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题