论文标题
关于非线性连续状态空间问题加强学习的融合
On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems
论文作者
论文摘要
我们考虑了非线性随机动力学系统增强学习的问题。我们表明,在RL设置中,除了贝尔曼的臭名昭著的``尺寸诅咒''之外,还有一个固有的``差异'',特别是,我们表明,解决方案的方差在近似值的顺序上呈指数增长。一个基本的后果是,这排除了除````本地''反馈解决方案''以外的任何其他东西,以控制爆炸性差异的增长,从而确保准确性。我们进一步表明,确定性的最佳控制具有扰动结构,因为较高的术语不影响局部的较低程度,因此可以在局部解决局部的情况,从而可以准确地使用局部。
We consider the problem of Reinforcement Learning for nonlinear stochastic dynamical systems. We show that in the RL setting, there is an inherent ``Curse of Variance" in addition to Bellman's infamous ``Curse of Dimensionality", in particular, we show that the variance in the solution grows factorial-exponentially in the order of the approximation. A fundamental consequence is that this precludes the search for anything other than ``local" feedback solutions in RL, in order to control the explosive variance growth, and thus, ensure accuracy. We further show that the deterministic optimal control has a perturbation structure, in that the higher order terms do not affect the calculation of lower order terms, which can be utilized in RL to get accurate local solutions.