与二进制动作代理的深度基于Q网络的多代理增强学习

论文标题

与二进制动作代理的深度基于Q网络的多代理增强学习

Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents

论文作者

Hafiz, Abdul Mueed, Bhat, Ghulam Mohiuddin

论文摘要

基于深Q网络（DQN）的多代理系统（MAS）用于加固学习（RL）使用代理中必须学习和交流的各种方案。但是，学习是针对每个代理商的特定的，并且可以为代理人设计令人满意的交流。 As more complex Deep QNetworks come to the fore, the overall complexity of the multi-agent system increases leading to issues like difficulty in training, need for higher resources and more training time, difficulty in fine-tuning, etc. To address these issues we propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions, for updation of the experience replay pool of the DQNs, where each agent is a DQN.该方法的好处是与传统的基于DQN的方法相比，总体简单，更快的收敛性和更好的性能。应该注意的是，该方法可以扩展到任何DQN。因此，我们分别在三个单独的任务（即Cartpole-V1（OpenAI Gym Eniversion），Lunarlander-V2（OpenAI Gym Gundry）和迷宫遍历（自定义环境）（定制环境）上，分别使用简单的DQN和DDQN和DDQN（双Q学习）。所提出的方法分别优于这些任务的基准，分别通过体面的边距。

Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate. The learning is however specific to each agent and communication may be satisfactorily designed for the agents. As more complex Deep QNetworks come to the fore, the overall complexity of the multi-agent system increases leading to issues like difficulty in training, need for higher resources and more training time, difficulty in fine-tuning, etc. To address these issues we propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions, for updation of the experience replay pool of the DQNs, where each agent is a DQN. The benefits of the approach are overall simplicity, faster convergence and better performance as compared to conventional DQN based approaches. It should be noted that the method can be extended to any DQN. As such we use simple DQN and DDQN (Double Q-learning) respectively on three separate tasks i.e. Cartpole-v1 (OpenAI Gym environment) , LunarLander-v2 (OpenAI Gym environment) and Maze Traversal (customized environment). The proposed approach outperforms the baseline on these tasks by decent margins respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题