通过多种学习策略的协调，应对模拟人类机器人互动期间人类的变异性奖励

论文标题

通过多种学习策略的协调，应对模拟人类机器人互动期间人类的变异性奖励

Coping with the variability in humans reward during simulated human-robot interactions through the coordination of multiple learning strategies

论文作者

Dromnelle, Rémi, Girard, Benoît, Renaudo, Erwan, Chatila, Raja, Khamassi, Mehdi

论文摘要

人类机器人相互作用（HRI）的当前重要挑战是使机器人能够从人类的反馈中学习。但是，人类在奖励机器人方面表现出很大的可变性。我们建议通过使机器人结合不同的学习策略，即基于模型的（MB）和无模型（MF）强化学习来解决这个问题。我们模拟了两个HRI方案：一个简单的任务，人类祝贺机器人将正确的立方体放在正确的盒子中，以及此任务的更复杂版本，必须以特定顺序放置立方体。我们表明，以前在机器人导航中测试的现有MB-MF协调算法在这里效果很好，而无需重新调整参数。它导致最大性能，同时仅产生与MF相同的最低计算成本。此外，无论模拟人类反馈的变异性如何，该算法都可以提供良好的性能，而单独的每种策略都会受到这种变异性的影响。总体而言，结果提出了一种有希望的方法，可以在面对可变的人类反馈时促进机器人学习灵活性。

An important current challenge in Human-Robot Interaction (HRI) is to enable robots to learn on-the-fly from human feedback. However, humans show a great variability in the way they reward robots. We propose to address this issue by enabling the robot to combine different learning strategies, namely model-based (MB) and model-free (MF) reinforcement learning. We simulate two HRI scenarios: a simple task where the human congratulates the robot for putting the right cubes in the right boxes, and a more complicated version of this task where cubes have to be placed in a specific order. We show that our existing MB-MF coordination algorithm previously tested in robot navigation works well here without retuning parameters. It leads to the maximal performance while producing the same minimal computational cost as MF alone. Moreover, the algorithm gives a robust performance no matter the variability of the simulated human feedback, while each strategy alone is impacted by this variability. Overall, the results suggest a promising way to promote robot learning flexibility when facing variable human feedback.

下载PDF全文

下载文献需遵守相关版权规定

论文标题