论文标题
从观察方法到动态不匹配转移学习的模仿
An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch
论文作者
论文摘要
我们研究了将源环境中学到的政策转移到具有不同动态的目标环境的问题,尤其是在减少学习过程中与目标环境的相互作用量至关重要的情况下。这个问题在SIM到实现的传输中尤为重要,因为模拟器不可避免地会不完美地对现实世界的动力学进行建模。在本文中,我们表明,解决此转移问题的一种现有解决方案 - 扎根的动作转换与观察(IFO)的模仿问题密切相关:模仿行为示范观察结果的学习行为。在建立了这种关系之后,我们假设可以有效地将IFO文献中最新的方法重新用于接地转移学习。要验证我们的假设,我们基于从仿制技术中的对抗性模仿来得出了一种新的算法 - 生成的对抗性增强的动作转化(GARAT)。我们在具有不匹配动力的多个域中进行实验,发现接受GARAT训练的代理人在目标环境中获得了更高的回报,与现有的黑盒转移方法相比
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this paper, we show that one existing solution to this transfer problem - grounded action transformation - is closely related to the problem of imitation from observation (IfO): learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for grounded transfer learning.To validate our hypothesis we derive a new algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques. We run experiments in several domains with mismatched dynamics, and find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods