论文标题
OIRR:具有暂时扩展的动作的强大对抗性逆增强学习
oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions
论文作者
论文摘要
给定环境的奖励功能的明确工程是强化学习方法的主要障碍。虽然逆增强学习(IRL)仅是从演示中恢复奖励功能的解决方案,但这些学识渊博的奖励通常是具有环境动力学的大量\ textIt {纠缠},因此不是便携式或\ emph {robh {robust {robhost}到变化的环境。现代对抗方法在减少IRL环境中的奖励纠缠方面取得了一些成功。在这项工作中,我们利用一种这样的方法,即对逆逆强化学习(AIRL),提出了一种算法,该算法通过对选项的策略来学习层次结构的分离奖励。我们表明,该方法具有学习\ emph {prenferizable}策略和奖励在复杂转移学习任务中的策略的能力,同时产生与最先进方法相当的连续控制基准的结果。
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods. While Inverse Reinforcement Learning (IRL) is a solution to recover reward functions from demonstrations only, these learned rewards are generally heavily \textit{entangled} with the dynamics of the environment and therefore not portable or \emph{robust} to changing environments. Modern adversarial methods have yielded some success in reducing reward entanglement in the IRL setting. In this work, we leverage one such method, Adversarial Inverse Reinforcement Learning (AIRL), to propose an algorithm that learns hierarchical disentangled rewards with a policy over options. We show that this method has the ability to learn \emph{generalizable} policies and reward functions in complex transfer learning tasks, while yielding results in continuous control benchmarks that are comparable to those of the state-of-the-art methods.