论文标题

贝叶斯优化的数据有效域随机化

Data-efficient Domain Randomization with Bayesian Optimization

论文作者

Muratore, Fabio, Eilers, Christian, Gienger, Michael, Peters, Jan

论文摘要

当学习机器人控制政策时,所需的现实世界数据通常非常昂贵,因此模拟中的学习是一种流行的策略。不幸的是,由于模拟与现实之间的不匹配,通常无法转移到现实世界中,称为“现实差距”。域随机化方法通过根据域参数分布的分布来随机对物理模拟器(源域)随机化来解决此问题,以获得能够克服现实差距的更强大的策略。大多数域随机方法都从固定分布中采样域参数。在SIM到现实的可传递性的背景下,该解决方案是次优的,因为它产生的策略未经明确优化真实系统(目标域)的奖励而进行了训练。此外,固定分布假设存在有关域参数的不确定性的先验知识。在本文中,我们提出了贝叶斯域随机化(BAYRN),这是一种黑框SIM到现实算法,该算法通过在学习过程中从现实世界目标域中稀疏数据进行学习时通过调整域参数分布来有效地解决任务。 Bayrn使用贝叶斯优化来搜索源域分布参数的空间,从而导致一项策略,该策略最大化了真实的目标,从而在策略优化过程中允许自适应分布。我们在实验中验证了SIM到SIM和SIM到现实实验中提出的方法,与两个机器人任务的三种基线方法进行了比较。我们的结果表明,Bayrn能够执行SIM到现实的转移,同时大大降低了所需的先验知识。

When learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. In this paper, we propose Bayesian Domain Randomization (BayRn), a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning given sparse data from the real-world target domain. BayRn uses Bayesian optimization to search the space of source domain distribution parameters such that this leads to a policy which maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach in sim-to-sim as well as in sim-to-real experiments, comparing against three baseline methods on two robotic tasks. Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源