在自动驾驶中学习预测性表示，以改善深度强化学习

论文标题

在自动驾驶中学习预测性表示，以改善深度强化学习

Learning predictive representations in autonomous driving to improve deep reinforcement learning

论文作者

Graves, Daniel, Nguyen, Nhat M., Hassanzadeh, Kimia, Jin, Jun

论文摘要

使用新颖的预测表示的加强学习应用于自主驾驶，以完成在泳道标记之间驾驶的任务，在途中，在模拟和真正的jackal机器人中都可以观察到在看不见的测试道路上对性能和泛化的实质性好处。新颖的预测性表示是通过一般价值函数（GVF）来学到的，以提供对未来车道中心和道路角度的付费或反事实的预测，这些预测形成了代理的状态的紧凑状态，可以改善在线和离线增强中的学习，以通过在线和离线增强学习中学习，以通过在培训数据中驱动不得在道路上学习的方法，以驱动道路。模拟和现实世界中的实验表明，强化学习中的预测性表示提高了学习效率，控制的平稳性和对训练期间从未显示过的道路的概括，包括损坏的车道标记。发现学习一个由不同时间尺度或折现因素上的几个预测组成的预测表示，可以大大提高控件的性能和平滑度。 jackal机器人在两个步骤的过程中接受了训练，在该过程中，首先从通过自动化和人类指导的环境中收集的数据中学习了预测性表示，然后是批量增强学习算法（BCQ）。我们得出的结论是，使用GVF的政策外预测表示，可以加强学习现实世界中的许多好处。

Reinforcement learning using a novel predictive representation is applied to autonomous driving to accomplish the task of driving between lane markings where substantial benefits in performance and generalization are observed on unseen test roads in both simulation and on a real Jackal robot. The novel predictive representation is learned by general value functions (GVFs) to provide out-of-policy, or counter-factual, predictions of future lane centeredness and road angle that form a compact representation of the state of the agent improving learning in both online and offline reinforcement learning to learn to drive an autonomous vehicle with methods that generalizes well to roads not in the training data. Experiments in both simulation and the real-world demonstrate that predictive representations in reinforcement learning improve learning efficiency, smoothness of control and generalization to roads that the agent was never shown during training, including damaged lane markings. It was found that learning a predictive representation that consists of several predictions over different time scales, or discount factors, improves the performance and smoothness of the control substantially. The Jackal robot was trained in a two step process where the predictive representation is learned first followed by a batch reinforcement learning algorithm (BCQ) from data collected through both automated and human-guided exploration in the environment. We conclude that out-of-policy predictive representations with GVFs offer reinforcement learning many benefits in real-world problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题