关于非线性连续状态空间问题加强学习的融合

论文标题

关于非线性连续状态空间问题加强学习的融合

On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems

论文作者

Goyal, Raman, Chakravorty, Suman, Wang, Ran, Mohamed, Mohamed Naveed Gul

论文摘要

我们考虑了非线性随机动力学系统增强学习的问题。我们表明，在RL设置中，除了贝尔曼的臭名昭著的``尺寸诅咒''之外，还有一个固有的``差异''，特别是，我们表明，解决方案的方差在近似值的顺序上呈指数增长。一个基本的后果是，这排除了除````本地''反馈解决方案''以外的任何其他东西，以控制爆炸性差异的增长，从而确保准确性。我们进一步表明，确定性的最佳控制具有扰动结构，因为较高的术语不影响局部的较低程度，因此可以在局部解决局部的情况，从而可以准确地使用局部。

We consider the problem of Reinforcement Learning for nonlinear stochastic dynamical systems. We show that in the RL setting, there is an inherent ``Curse of Variance" in addition to Bellman's infamous ``Curse of Dimensionality", in particular, we show that the variance in the solution grows factorial-exponentially in the order of the approximation. A fundamental consequence is that this precludes the search for anything other than ``local" feedback solutions in RL, in order to control the explosive variance growth, and thus, ensure accuracy. We further show that the deterministic optimal control has a perturbation structure, in that the higher order terms do not affect the calculation of lower order terms, which can be utilized in RL to get accurate local solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题