使用深入的强化学习，分散的多机构追求

论文标题

使用深入的强化学习，分散的多机构追求

Decentralized Multi-Agent Pursuit using Deep Reinforcement Learning

论文作者

Souza Jr, Cristino de, Newbury, Rhys, Cosgun, Akansel, Castillo, Pedro, Vidolov, Boris, Kulic, Dana

论文摘要

追求逃避是用一个或多个追随者捕获移动目标的问题。我们使用深度强化学习来追求具有多种均匀的代理的全向目标目标，这些目标受到独轮车运动限制的约束。我们使用共享的经验来培训给定数量的追随者的政策，这些策略在运行时由每个代理人独立执行。培训受益于课程学习，这是一种散布的角度订单，可以在当地代表邻近的代理商，并通过结合个人和团体奖励的奖励结构来鼓励良好的形成。用反应性逃避器和多达八个追随者进行了模拟实验，表明我们基于学习的方法，具有非全面的代理，与具有综合剂的经典算法相同，并且超过了其非全面的适应性。在概念验证示范中，通过三个运动受限的追随者无人机成功地转移到现实世界中。

Pursuit-evasion is the problem of capturing mobile targets with one or more pursuers. We use deep reinforcement learning for pursuing an omni-directional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints. We use shared experience to train a policy for a given number of pursuers that is executed independently by each agent at run-time. The training benefits from curriculum learning, a sweeping-angle ordering to locally represent neighboring agents and encouraging good formations with reward structure that combines individual and group rewards. Simulated experiments with a reactive evader and up to eight pursuers show that our learning-based approach, with non-holonomic agents, performs on par with classical algorithms with omni-directional agents, and outperforms their non-holonomic adaptations. The learned policy is successfully transferred to the real world in a proof-of-concept demonstration with three motion-constrained pursuer drones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题