深层结构化团队中的强化学习：具有有限和无限功能的初始结果

论文标题

深层结构化团队中的强化学习：具有有限和无限功能的初始结果

Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

论文作者

Arabneydi, Jalal, Roudneshin, Masoud, Aghdam, Amir G.

论文摘要

在本文中，我们考虑了Markov链和线性二次模型，用于在两个非经典信息结构下具有折扣和时间平均成本功能的深层结构化团队，即深层州共享，没有共享。在深层结构化的团队中，代理通过深层状态结合动态和成本功能，在深处，深度是指国家的一系列正交线性回归。在本文中，我们考虑了Markov链模型（即状态的经验分布）的均匀线性回归，以及线性二次模型（即状态的加权平均值）的一些正式线性回归。在已知模型的情况下，开发了一些计划算法，并针对模型不完全知道的情况提出了一些强化学习算法。建立了两种无模型（增强学习）算法的收敛性（一种用于马尔可夫链模型，一种用于线性二次模型）的收敛性。然后将结果应用于智能电网。

In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.

下载PDF全文

下载文献需遵守相关版权规定

论文标题