论文标题
GPU加速政策优化通过批处理自动差异化的高斯流程以实现现实世界控制
GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control
论文作者
论文摘要
高斯工艺(GPS)预测动态系统作为参数模型的样本效率替代品的能力似乎对现实世界的机器人研究有希望。但是,GPS的计算复杂性使策略搜索成为了很高的时间和记忆消耗过程,无法扩展到更大的问题。在这项工作中,我们通过利用快速的预测抽样方法来开发一种策略优化方法,以在每次远程通行证中处理轨迹的批次,并通过自动差异化蒙特卡洛评估来计算梯度更新,而逐渐更新,这als ass y monte carlo评估均在gpu上。我们证明了使用重型机器对一套参考跟踪控制实验的培训策略中方法的有效性。基准结果表明,对精确方法的速度有显着的加速,并展示了我们对较大策略网络,更长的视野和多达数千个轨迹的可扩展性,速度下降。
The ability of Gaussian processes (GPs) to predict the behavior of dynamical systems as a more sample-efficient alternative to parametric models seems promising for real-world robotics research. However, the computational complexity of GPs has made policy search a highly time and memory consuming process that has not been able to scale to larger problems. In this work, we develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass, and compute gradient updates over policy parameters by automatic differentiation of Monte Carlo evaluations, all on GPU. We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine. Benchmark results show a significant speedup over exact methods and showcase the scalability of our method to larger policy networks, longer horizons, and up to thousands of trajectories with a sublinear drop in speed.