机器人动作选择通过分层维度知情程序综合学习

论文标题

机器人动作选择通过分层维度知情程序综合学习

Robot Action Selection Learning via Layered Dimension Informed Program Synthesis

论文作者

Holtz, Jarrett, Guha, Arjun, Biswas, Joydeep

论文摘要

用于将低级机器人技能组成复杂高级任务的行动选择策略（ASP）通常表示为艺术状态的神经网络（NNS）。这样的范式虽然非常有效，但却遇到了一些关键问题：1）NNS对用户不透明，因此不适合验证，2）它们需要大量的培训数据，3）当域更改时，它们很难修复。我们提供了有关ASP的两个关键见解。首先，ASP需要推理从世界状态得出的物理有意义的数量，其次，存在一个分层结构来组成这些政策。利用这些洞察力，我们引入了分层尺寸信息的程序合成（LDIP） - 通过推理状态变量的物理维度以及对操作员的维度约束，LDIP直接在人类交流域的特定于特定的语言中综合了ASP，该语言可修复程序维修。我们提出了经验结果，以证明LDIP 1）可以合成机器人足球和自主驾驶领域的有效ASP，2）需要比可比的NN表示要少两个数量级的训练示例，而3）可以修复合成的ASP，仅在转移从模拟到真实机器人的校正时，只需少量校正。

Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) - by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) requires two orders of magnitude fewer training examples than a comparable NN representation, and 3) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots.

下载PDF全文

下载文献需遵守相关版权规定

论文标题