论文标题
政策梯度方法如何受控制范围的影响?
How are policy gradient methods affected by the limits of control?
论文作者
论文摘要
我们从控制理论限制的角度研究随机策略梯度方法。我们的主要结果是,在多伊尔(Doyle)意义上,条件不足的线性系统不可避免地会导致嘈杂的梯度估计。我们还举例说明了一类稳定系统,其中政策梯度方法遭受了维度的诅咒。我们的结果适用于状态反馈和部分观察到的系统。
We study stochastic policy gradient methods from the perspective of control-theoretic limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle inevitably lead to noisy gradient estimates. We also give an example of a class of stable systems in which policy gradient methods suffer from the curse of dimensionality. Our results apply to both state feedback and partially observed systems.