Palmer：带有内存的感知性循环，用于长匹马计划

论文标题

Palmer：带有内存的感知性循环，用于长匹马计划

PALMER: Perception-Action Loop with Memory for Long-Horizon Planning

论文作者

Beker, Onur, Mohammadi, Mohammad, Zamir, Amir

论文摘要

为了在先验未知的现实世界情景中实现自治，代理应该能够：i）从高维感官观察（例如，图像）中采取行动，ii）从过去的经验中学习以适应和改进，iii）可以进行长期的地平线计划。经典规划算法（例如PRM，RRT）精通处理长距计划。基于深度学习的方法依次可以通过对观测值之间的统计意外进行建模来提供必要的表示以解决其他方法。在这个方向上，我们介绍了一种称为Palmer的通用计划算法，该算法将基于经典抽样的计划算法与基于学习的知觉表示结合在一起。为了培训这些感知表示形式，我们将Q学习与对比度表示学习相结合，以创建一个潜在的空间，在这种空间中，两个州的嵌入之间的距离捕获了它们之间最佳策略可以在它们之间穿越多么容易。为了使用这些感知表示形式进行计划，我们重新使用基于经典抽样的计划算法从重播缓冲区中检索先前观察到的轨迹段，并将它们降低到将任何给定的启动和目标状态连接的大约最佳路径中。这在表示形式学习，记忆，增强学习和基于抽样的计划之间创建了一个紧密的反馈回路。最终结果是与现有方法相比，长马计划的体验框架具有明显更强和样本的效率。

To achieve autonomy in a priori unknown real-world scenarios, agents should be able to: i) act from high-dimensional sensory observations (e.g., images), ii) learn from past experience to adapt and improve, and iii) be capable of long horizon planning. Classical planning algorithms (e.g. PRM, RRT) are proficient at handling long-horizon planning. Deep learning based methods in turn can provide the necessary representations to address the others, by modeling statistical contingencies between observations. In this direction, we introduce a general-purpose planning algorithm called PALMER that combines classical sampling-based planning algorithms with learning-based perceptual representations. For training these perceptual representations, we combine Q-learning with contrastive representation learning to create a latent space where the distance between the embeddings of two states captures how easily an optimal policy can traverse between them. For planning with these perceptual representations, we re-purpose classical sampling-based planning algorithms to retrieve previously observed trajectory segments from a replay buffer and restitch them into approximately optimal paths that connect any given pair of start and goal states. This creates a tight feedback loop between representation learning, memory, reinforcement learning, and sampling-based planning. The end result is an experiential framework for long-horizon planning that is significantly more robust and sample efficient compared to existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题