在预算约束下，用于多重强化学习的Q值共享框架

论文标题

在预算约束下，用于多重强化学习的Q值共享框架

A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint

论文作者

Zhu, Changxi, Leung, Ho-fung, Hu, Shuyue, Cai, Yi

论文摘要

在教师学生框架中，一个经验丰富的代理商（教师）通过建议采取某些州采取行动来帮助加速另一个代理商（学生）。在代理商需要彼此合作的合作多种增强学习（MARL）中，即使遵循教师的建议行动，学生也可能无法与他人合作，因为所有代理商的政策在收敛之前都在变化。当代理人相互通信的次数有限（即，预算限制）时，使用动作作为建议的建议策略可能还不够好。我们为合作社MARL代理人学习具有预算限制的合作者，为合作的MARL代理提供了一个参与者咨询框架（PSAF）。在PSAF中，每个Q学习者都可以决定何时要求Q值并共享其Q值。我们在三个典型的多基金会学习问题中进行实验。评估结果表明，我们的方法PSAF在无限和预算有限的情况下优于现有的建议方法，我们对建议行动和共享Q值对代理商学习的影响进行分析。

In teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multiagent reinforcement learning (MARL), where agents need to cooperate with one another, a student may fail to cooperate well with others even by following the teachers' suggested actions, as the polices of all agents are ever changing before convergence. When the number of times that agents communicate with one another is limited (i.e., there is budget constraint), the advising strategy that uses actions as advices may not be good enough. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraint. In PSAF, each Q-learner can decide when to ask for Q-values and share its Q-values. We perform experiments in three typical multiagent learning problems. Evaluation results show that our approach PSAF outperforms existing advising methods under both unlimited and limited budget, and we give an analysis of the impact of advising actions and sharing Q-values on agents' learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题