使用奖励成型和课程学习实现目标

论文标题

使用奖励成型和课程学习实现目标

Achieving Goals using Reward Shaping and Curriculum Learning

论文作者

Anca, Mihai, Thomas, Jonathan D., Pedamonti, Dabal, Studley, Matthew, Hansen, Mark

论文摘要

机器人技术的实时控制是强化学习界的一个流行研究领域。通过使用诸如奖励成型等技术，研究人员设法培训了许多领域的在线代理。尽管有这些进展，解决目标的任务仍然需要复杂的建筑变化或对问题的严格限制。在本文中，我们通过结合课程学习，奖励成型和大量有效并行的环境来解决堆叠多个立方体的问题。我们介绍了两个课程学习设置，使我们能够将复杂的任务分为顺序的子目标，从而实现了可能太困难的问题的学习。我们专注于讨论在目标条件环境中实施它们时遇到的挑战。最后，我们扩展了在具有不同形状的对象的较高复杂性环境上确定的最佳配置。

Real-time control for robotics is a popular research area in the reinforcement learning community. Through the use of techniques such as reward shaping, researchers have managed to train online agents across a multitude of domains. Despite these advances, solving goal-oriented tasks still requires complex architectural changes or hard constraints to be placed on the problem. In this article, we solve the problem of stacking multiple cubes by combining curriculum learning, reward shaping, and a high number of efficiently parallelized environments. We introduce two curriculum learning settings that allow us to separate the complex task into sequential sub-goals, hence enabling the learning of a problem that may otherwise be too difficult. We focus on discussing the challenges encountered while implementing them in a goal-conditioned environment. Finally, we extend the best configuration identified on a higher complexity environment with differently shaped objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题