社会机器人技术的一般，进化启发的奖励功能

论文标题

社会机器人技术的一般，进化启发的奖励功能

A General, Evolution-Inspired Reward Function for Social Robotics

论文作者

Kingsford, Thomas

论文摘要

社会机器人技术领域可能需要偏离设计行为和模仿学习的范式，并采用现代强化学习（RL）方法，以使机器人能够与人类的流畅和有效的互动。在本文中，我们将社会奖励功能作为一种机制提供（1）RL代理在社会机器人中部署所必需的实时，密集的奖励功能，以及（2）比较不同社交机器人功效的标准化客观指标。社会奖励功能旨在密切模仿人类的遗传赋予社会感知能力，以提供简单，稳定和文化的奖励功能。目前，在社会机器人方面，社会机器人技术中使用的数据集很小，要么显着偏见。社会奖励功能的使用将使更大的内域数据集接近社会机器人的行为政策，这将允许进一步的改进以奖励功能和社会机器人的行为政策。我们认为，这将是发展未来有效的社会机器人的关键。

The field of social robotics will likely need to depart from a paradigm of designed behaviours and imitation learning and adopt modern reinforcement learning (RL) methods to enable robots to interact fluidly and efficaciously with humans. In this paper, we present the Social Reward Function as a mechanism to provide (1) a real-time, dense reward function necessary for the deployment of RL agents in social robotics, and (2) a standardised objective metric for comparing the efficacy of different social robots. The Social Reward Function is designed to closely mimic those genetically endowed social perception capabilities of humans in an effort to provide a simple, stable and culture-agnostic reward function. Presently, datasets used in social robotics are either small or significantly out-of-domain with respect to social robotics. The use of the Social Reward Function will allow larger in-domain datasets to be collected close to the behaviour policy of social robots, which will allow both further improvements to reward functions and to the behaviour policies of social robots. We believe this will be the key enabler to developing efficacious social robots in the future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题