论文标题
计划空间状态嵌入,以改善加固学习
Plan-Space State Embeddings for Improved Reinforcement Learning
论文作者
论文摘要
机器人控制问题通常是通过策略函数构建的,该策略函数将状态值映射到控制值中,但是在许多动态问题中,观察到的状态可能很难表征与有用的策略动作的关系。在本文中,我们提出了一种从计划或其他形式的演示中学习状态嵌入的新方法,使嵌入空间与演示具有指定的几何关系。我们提出了一个新的变分框架,用于学习这些嵌入,试图优化学习的嵌入空间中的轨迹线性。我们展示了如何将这些嵌入空间用作增强学习问题的机器人状态的增强。我们使用动力学计划来为某些示例环境生成训练轨迹,然后为这些环境训练嵌入空间。我们从经验上表明,在学习的嵌入空间中观察系统可以改善政策梯度增强学习算法的性能,尤其是通过减少训练运行之间的差异。我们的技术仅限于可用演示数据的环境,但是对收集数据的收集方式没有限制。我们的嵌入技术通过创建具有有意义的几何形状的机器人状态的抽象表示,提供了一种从现有技术(例如计划和控制算法)转移到更灵活的策略学习算法中的方法。
Robot control problems are often structured with a policy function that maps state values into control values, but in many dynamic problems the observed state can have a difficult to characterize relationship with useful policy actions. In this paper we present a new method for learning state embeddings from plans or other forms of demonstrations such that the embedding space has a specified geometric relationship with the demonstrations. We present a novel variational framework for learning these embeddings that attempts to optimize trajectory linearity in the learned embedding space. We show how these embedding spaces can then be used as an augmentation to the robot state in reinforcement learning problems. We use kinodynamic planning to generate training trajectories for some example environments, and then train embedding spaces for these environments. We show empirically that observing a system in the learned embedding space improves the performance of policy gradient reinforcement learning algorithms, particularly by reducing the variance between training runs. Our technique is limited to environments where demonstration data is available, but places no limits on how that data is collected. Our embedding technique provides a way to transfer domain knowledge from existing technologies such as planning and control algorithms, into more flexible policy learning algorithms, by creating an abstract representation of the robot state with meaningful geometry.