梦想结构：一种开放式学习机器人技术的发展方法

论文标题

梦想结构：一种开放式学习机器人技术的发展方法

DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics

论文作者

Doncieux, Stephane, Bredeche, Nicolas, Goff, Léni Le, Girard, Benoît, Coninx, Alexandre, Sigaud, Olivier, Khamassi, Mehdi, Díaz-Rodríguez, Natalia, Filliat, David, Hospedales, Timothy, Eiben, A., Duro, Richard

论文摘要

机器人仍然仅限于受控条件，机器人设计师知道有足够的详细信息，以赋予机器人的适当模型或行为。学习算法增加了一些灵活性，可以通过某些演示或通过增强学习算法指导其探索的适当行为的能力。强化学习算法依赖于定义可及行为的状态和行动空间的定义。它们的适应能力在很大程度上取决于这些空间的表示：较小和离散的空间会导致快速学习，而较大且连续的空间充满挑战，要么需要长时间的训练期，要么防止机器人融合到适当的行为。除了在策略执行和学习周期的运营周期外，它以较慢的时间范围起作用以获取新的政策，我们还引入了重新描述周期，第三个周期以更较慢的时间尺度工作，以生成或使所需的表示形式适应机器人，环境和任务。我们介绍了这个周期提出的挑战，并提出了梦想（自主机器中经验的延期重组），这是一种发展性认知架构，逐个阶段引导此重新描述过程，以适当的动机构建新的状态表示，并跨机器人甚至跨机器人转移所获得的知识。我们描述了迄今为止通过这种方法获得的结果，最终讨论了它在神经科学中提出的问题。

Robots are still limited to controlled conditions, that the robot designer knows with enough details to endow the robot with the appropriate models or behaviors. Learning algorithms add some flexibility with the ability to discover the appropriate behavior given either some demonstrations or a reward to guide its exploration with a reinforcement learning algorithm. Reinforcement learning algorithms rely on the definition of state and action spaces that define reachable behaviors. Their adaptation capability critically depends on the representations of these spaces: small and discrete spaces result in fast learning while large and continuous spaces are challenging and either require a long training period or prevent the robot from converging to an appropriate behavior. Beside the operational cycle of policy execution and the learning cycle, which works at a slower time scale to acquire new policies, we introduce the redescription cycle, a third cycle working at an even slower time scale to generate or adapt the required representations to the robot, its environment and the task. We introduce the challenges raised by this cycle and we present DREAM (Deferred Restructuring of Experience in Autonomous Machines), a developmental cognitive architecture to bootstrap this redescription process stage by stage, build new state representations with appropriate motivations, and transfer the acquired knowledge across domains or tasks or even across robots. We describe results obtained so far with this approach and end up with a discussion of the questions it raises in Neuroscience.

下载PDF全文

下载文献需遵守相关版权规定

论文标题