政策梯度方法如何受控制范围的影响？

论文标题

政策梯度方法如何受控制范围的影响？

How are policy gradient methods affected by the limits of control?

论文作者

Ziemann, Ingvar, Tsiamis, Anastasios, Sandberg, Henrik, Matni, Nikolai

论文摘要

我们从控制理论限制的角度研究随机策略梯度方法。我们的主要结果是，在多伊尔（Doyle）意义上，条件不足的线性系统不可避免地会导致嘈杂的梯度估计。我们还举例说明了一类稳定系统，其中政策梯度方法遭受了维度的诅咒。我们的结果适用于状态反馈和部分观察到的系统。

We study stochastic policy gradient methods from the perspective of control-theoretic limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle inevitably lead to noisy gradient estimates. We also give an example of a class of stable systems in which policy gradient methods suffer from the curse of dimensionality. Our results apply to both state feedback and partially observed systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题