示例当地最佳策略包含不稳定的控制

论文标题

示例当地最佳策略包含不稳定的控制

Example When Local Optimal Policies Contain Unstable Control

论文作者

Song, Bing, Slotine, Jean-Jacques, Pham, Quang-Cuong

论文摘要

我们提供了一种新的观点，以了解为什么加强学习（RL）在稳健性和概括方面挣扎。通过示例，我们证明了本地最佳策略可能对某些动态参数包含不稳定的控制，并且对此类不稳定性的过度拟合可能会恶化鲁棒性和概括。神经控制的收缩分析表明，相对于控制网络的输入梯度，稳定和不稳定控制之间存在边界。忽略这些稳定性边界，学习剂可能会标记导致某些动态参数的不稳定性的动作，如果这些动作可以改善预期的回报，则将其标记为高价值动作。在实证研究中，这种不稳定性的一小部分可能不会引起关注，这是现实世界应用的隐藏风险。这些不稳定性可以通过过度拟合来表现出来，从而导致稳健性和泛化失败。我们提出了稳定性约束和终端约束，以解决此问题，以近端策略优化示例证明。

We provide a new perspective to understand why reinforcement learning (RL) struggles with robustness and generalization. We show, by examples, that local optimal policies may contain unstable control for some dynamic parameters and overfitting to such instabilities can deteriorate robustness and generalization. Contraction analysis of neural control reveals that there exists boundaries between stable and unstable control with respect to the input gradients of control networks. Ignoring those stability boundaries, learning agents may label the actions that cause instabilities for some dynamic parameters as high value actions if those actions can improve the expected return. The small fraction of such instabilities may not cause attention in the empirical studies, a hidden risk for real-world applications. Those instabilities can manifest themselves via overfitting, leading to failures in robustness and generalization. We propose stability constraints and terminal constraints to solve this issue, demonstrated with a proximal policy optimization example.

下载PDF全文

下载文献需遵守相关版权规定

论文标题