可行的对抗性强大的强化增强学习针对未指定的环境

论文标题

可行的对抗性强大的强化增强学习针对未指定的环境

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

论文作者

Lanier, JB, McAleer, Stephen, Baldi, Pierre, Fox, Roy

论文摘要

强大的增强学习（RL）认为在一组可能的环境参数值中，在最坏情况下表现良好的学习政策问题。在实际环境中，选择鲁棒RL的可能值集可能是一项艰巨的任务。当指定该集的时间太狭窄时，代理将容易受到不偏见的合理参数值的影响。如果规定过于广泛，则代理商将太谨慎。在本文中，我们提出了可行的对抗性鲁棒RL（FARR），这是一种新型的问题制定和客观，用于自动确定环境参数值集合。 Farr隐式将可行的参数值定义为代理可以在给定足够的培训资源的基准奖励的情况下。通过将此问题作为两人零和游戏的配方，优化FARR目标可以通过可行的支持而超过参数值的对抗性分布，并且在此可行参数集中稳健。我们证明，可以使用PSRO算法的变体找到该目标的近似NASH平衡。此外，我们表明，与现有的minimax，域随机化相比，接受FARR训练的最佳代理更适合可行的对抗参数选择，并且在参数化的网格世界和三个Mujoco控制环境中遗憾目标。

Robust reinforcement learning (RL) considers the problem of learning policies that perform well in the worst case among a set of possible environment parameter values. In real-world environments, choosing the set of possible values for robust RL can be a difficult task. When that set is specified too narrowly, the agent will be left vulnerable to reasonable parameter values unaccounted for. When specified too broadly, the agent will be too cautious. In this paper, we propose Feasible Adversarial Robust RL (FARR), a novel problem formulation and objective for automatically determining the set of environment parameter values over which to be robust. FARR implicitly defines the set of feasible parameter values as those on which an agent could achieve a benchmark reward given enough training resources. By formulating this problem as a two-player zero-sum game, optimizing the FARR objective jointly produces an adversarial distribution over parameter values with feasible support and a policy robust over this feasible parameter set. We demonstrate that approximate Nash equilibria for this objective can be found using a variation of the PSRO algorithm. Furthermore, we show that an optimal agent trained with FARR is more robust to feasible adversarial parameter selection than with existing minimax, domain-randomization, and regret objectives in a parameterized gridworld and three MuJoCo control environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题