与环境异质性的联合加强学习

论文标题

与环境异质性的联合加强学习

Federated Reinforcement Learning with Environment Heterogeneity

论文作者

Jin, Hao, Peng, Yang, Yang, Wenhao, Wang, Shusen, Zhang, Zhihua

论文摘要

我们研究了一个联邦强化学习（FEDRL）的问题，其中$ n $代理商协作学习了单个政策，而无需共享它们在代理环境互动期间收集的轨迹。我们强调环境异质性的限制，这意味着与这些$ n $代理相对应的$ n $环境具有不同的状态过渡。为了获得优化所有环境中总体性能的价值函数或策略函数，我们提出了两个联合的RL算法，\ texttt {qavg}和\ texttttt {pavg}。从理论上讲，我们证明了这些算法会收敛到次优的解决方案，而这种次优的算法取决于这些$ n $环境的异质性。此外，我们提出了一种启发式，该启发式方法通过将$ n $环境嵌入$ n $ vectors中来实现个性化。个性化启发式不仅可以改善培训，而且还可以更好地对新环境进行概括。

We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题