论文标题

在2D环境中使用自主船的深钢筋学习方法

Using Deep Reinforcement Learning Methods for Autonomous Vessels in 2D Environments

论文作者

Etemad, Mohammad, Zare, Nader, Sarvmaili, Mahtab, Soares, Amilcar, Machado, Bruno Brandoli, Matwin, Stan

论文摘要

无人的地表车辆技术(USV)是一个令人兴奋的话题,它基本上部署了一种算法来安全有效地执行任务。尽管加强学习是建模这种任务的一种众所周知的方法,但是在组合政体和功能近似结合时可能会发生不稳定性和差异。在这项工作中,我们使用了将Q学习与神经表现形式相结合的深度加强学习,以避免不稳定。我们的方法使用深度Q学习,并将其与敏捷方法的滚动波计划方法相结合。我们的方法包含两个关键部分,以便在未知环境中执行任务。第一个是一个路径规划师,负责在不考虑根部细节的情况下生成潜在的有效路径。后者是一个决策模块,负责在价值函数背景下在不久的将来的USV开发步骤中避免障碍的短期决策。使用两种算法进行了模拟:基本的香草血管导航器(VVN)作为基线,并针对具有计划者和局部视图(VNPLV)的船只导航器进行了改进。实验结果表明,提出的方法平均使VVN的性能平均提高了55.31。我们的模型通过在未知环境中使用计划自适应路径进行深入的强化学习,成功地证明了避免障碍。

Unmanned Surface Vehicles technology (USVs) is an exciting topic that essentially deploys an algorithm to safely and efficiently performs a mission. Although reinforcement learning is a well-known approach to modeling such a task, instability and divergence may occur when combining off-policy and function approximation. In this work, we used deep reinforcement learning combining Q-learning with a neural representation to avoid instability. Our methodology uses deep q-learning and combines it with a rolling wave planning approach on agile methodology. Our method contains two critical parts in order to perform missions in an unknown environment. The first is a path planner that is responsible for generating a potential effective path to a destination without considering the details of the root. The latter is a decision-making module that is responsible for short-term decisions on avoiding obstacles during the near future steps of USV exploitation within the context of the value function. Simulations were performed using two algorithms: a basic vanilla vessel navigator (VVN) as a baseline and an improved one for the vessel navigator with a planner and local view (VNPLV). Experimental results show that the proposed method enhanced the performance of VVN by 55.31 on average for long-distance missions. Our model successfully demonstrated obstacle avoidance by means of deep reinforcement learning using planning adaptive paths in unknown environments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源