简单的传感器意图

论文标题

简单的传感器意图

Simple Sensor Intentions for Exploration

论文作者

Hertweck, Tim, Riedmiller, Martin, Bloesch, Michael, Springenberg, Jost Tobias, Siegel, Noah, Wulfmeier, Markus, Hafner, Roland, Heess, Nicolas

论文摘要

现代的加强学习算法可以学习解决日益困难的控制问题的解决方案，同时减少其应用所需的先验知识量。其余的挑战之一是奖励方案的定义，这些计划适当地促进了探索而不会以不良方式偏向解决方案，并且可以在无昂贵仪器的情况下在实际机器人系统上实施。在本文中，我们专注于通过简单的稀疏奖励来定义目标任务的设置，并通过代理内部辅助任务来促进探索。我们介绍了简单传感器意图（SSIS）的想法，作为定义辅助任务的通用方法。 SSIS减少定义合适奖励所需的先验知识量。它们可以直接从原始传感器流直接计算，因此不需要对实际系统上的昂贵且可能是脆弱的状态估计。我们证明，基于这些奖励的学习系统可以在模拟和现实世界中解决复杂的机器人任务。特别是，我们表明，当仅将原始传感器流用于控制器输入和辅助奖励定义时，真正的机器人手臂可以学会从头开始掌握，抬起和解决一项单盘任务。

Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic systems without expensive instrumentation. In this paper we focus on a setting in which goal tasks are defined via simple sparse rewards, and exploration is facilitated via agent-internal auxiliary tasks. We introduce the idea of simple sensor intentions (SSIs) as a generic way to define auxiliary tasks. SSIs reduce the amount of prior knowledge that is required to define suitable rewards. They can further be computed directly from raw sensor streams and thus do not require expensive and possibly brittle state estimation on real systems. We demonstrate that a learning system based on these rewards can solve complex robotic tasks in simulation and in real world settings. In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题