论文标题

奖励延迟的多代理增强学习

Multi-Agent Reinforcement Learning with Reward Delays

论文作者

Zhang, Yuyang, Zhang, Runyu, Gu, Yuantao, Li, Na

论文摘要

本文考虑了多代理增强学习(MARL),其中延迟后收到奖励,并且延迟时间在代理商和时间步骤之间都会有所不同。基于V-Learning框架,本文提出了有效处理奖励延迟的MARL算法。当延迟是有限的时,我们的算法达到了速率的粗相关平衡(CCE) $ \ tilde {\ Mathcal {o}}(\ frac {h^3 \ sqrt {s \ Mathcal {t} _k}}} {k}+\ frac {h^3 \ sqrt {sqrt {sa}}}}规划范围,$ s $是状态空间的大小,$ a $是最大的动作空间的大小,而$ \ Mathcal {t} _K $是本文正式定义的总延迟的度量。此外,通过奖励跳过计划,我们的算法扩展到了无限延迟的案例。它实现了类似于有限延迟情况的收敛速率。

This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate $\tilde{\mathcal{O}}(\frac{H^3\sqrt{S\mathcal{T}_K}}{K}+\frac{H^3\sqrt{SA}}{\sqrt{K}})$ where $K$ is the number of episodes, $H$ is the planning horizon, $S$ is the size of the state space, $A$ is the size of the largest action space, and $\mathcal{T}_K$ is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源