基于模型的强化学习，用于分散的多基因集合

论文标题

基于模型的强化学习，用于分散的多基因集合

Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous

论文作者

Wang, Rose E., Kew, J. Chase, Lee, Dennis, Lee, Tsang-Wei Edward, Zhang, Tingnan, Ichter, Brian, Tan, Jie, Faust, Aleksandra

论文摘要

协作要求代理商即时保持目标。人类将目标与其他代理保持一致的能力是他们预测他人意图并积极更新自己的计划的能力。我们提出了层次预测计划（HPP），这是一种基于模型的增强钢筋学习方法，用于分散的多种聚合会。从预估计的单格指点导航策略开始，并使用LIDAR等嘈杂的高维传感器输入，我们首先通过对团队中所有代理的自学运动预测进行学习。接下来，HPP使用预测模型来提出和评估导航子目标，以完成集合任务，而无需在代理之间进行明确通信。我们在一个看不见的环境中评估HPP，复杂性和障碍的数量增加。我们表明，HPP在挑战，看不见的环境方面超过了替代性强化学习，路径计划和基于启发式的基准。现实世界中的实验表明，预测模型从SIM到现实世界的成功转移而没有任何其他微调。总体而言，HPP通过结合基于模型的RL和推理方法来消除多种系统中集中式操作员的需求，从而使代理能够动态地对齐计划。

Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans.

下载PDF全文

下载文献需遵守相关版权规定

论文标题