论文标题
使用主动学习最大化BCI人类反馈
Maximizing BCI Human Feedback using Active Learning
论文作者
论文摘要
\ textit {从人类反馈中学习的最新进步}提出了一种通过非专家人类的输入来训练机器人代理的有效方法,而无需特别设计的奖励功能。但是,这种方法需要在机器人学习期间出现和专心的人来提供评估反馈。此外,随着任务难度的水平和人类反馈的质量,需要的反馈量可能会随着疲劳而随着时间的流逝而下降。为了克服这些局限性并使学习更多复杂性更高的机器人任务,需要最大程度地提高收到的昂贵反馈的质量并减少所需的人类认知参与量。在这项工作中,我们提出了一种使用主动学习的方法,该方法基于机器人的不确定性来巧妙地为人类主管选择查询,并有效地减少了学习给定任务所需的反馈量。我们还使用一种新颖的多重缓冲系统来提高鲁棒性,以防止随着机器人学习的发展,防止灾难性遗忘。与以前的方法相比,使用较少数量的人类反馈来学习更复杂的任务是可能的。我们在机器人臂完成任务上展示了我们提出的方法的实用性,在该任务中,机器人学会在3D中到达3D的位置而不会与障碍物相撞。与以前不使用主动学习的方法相比,我们的方法能够更快地学习这项任务,人为的反馈和认知参与较少。
Recent advancements in \textit{Learning from Human Feedback} present an effective way to train robot agents via inputs from non-expert humans, without a need for a specially designed reward function. However, this approach needs a human to be present and attentive during robot learning to provide evaluative feedback. In addition, the amount of feedback needed grows with the level of task difficulty and the quality of human feedback might decrease over time because of fatigue. To overcome these limitations and enable learning more robot tasks with higher complexities, there is a need to maximize the quality of expensive feedback received and reduce the amount of human cognitive involvement required. In this work, we present an approach that uses active learning to smartly choose queries for the human supervisor based on the uncertainty of the robot and effectively reduces the amount of feedback needed to learn a given task. We also use a novel multiple buffer system to improve robustness to feedback noise and guard against catastrophic forgetting as the robot learning evolves. This makes it possible to learn tasks with more complexity using lesser amounts of human feedback compared to previous methods. We demonstrate the utility of our proposed method on a robot arm reaching task where the robot learns to reach a location in 3D without colliding with obstacles. Our approach is able to learn this task faster, with less human feedback and cognitive involvement, compared to previous methods that do not use active learning.