为基于视觉的机器人技术进行基准测试强化学习算法

论文标题

为基于视觉的机器人技术进行基准测试强化学习算法

Benchmarking Deep Reinforcement Learning Algorithms for Vision-based Robotics

论文作者

Kumar, Swagat, Sampson, Hayden, Behera, Ardhendu

论文摘要

本文介绍了用于解决两个基于模拟的基于视力的机器人问题的最先进的增强学习算法的基准研究。这项研究中考虑的算法包括软演员 - 批评者（SAC），近端政策优化（PPO），插值策略梯度（IPG）及其具有事后见解经验重播（她）的变体。这些算法的性能分别与Pybullet的两个模拟环境进行了比较，分别称为Kukadiverseobjectenv和ReceCarzedgyMenv。这些环境中的状态观察以RGB图像的形式获得，并且动作空间是连续的，因此难以解决。建议采用许多策略，以提供有关这些问题实施算法所需的中间事后目标目标，这些问题本质上是单目标环境。此外，提出了许多特征提取体系结构，以在学习过程中纳入空间和时间的关注。通过严格的模拟实验，建立了这些组件的改进。据我们所知，对于上述两个基于视觉的机器人问题，无法获得这样的基准测试研究，这使其成为该领域的新颖贡献。

This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two simulated vision-based robotics problems. The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG), and their variants with Hindsight Experience replay (HER). The performances of these algorithms are compared against PyBullet's two simulation environments known as KukaDiverseObjectEnv and RacecarZEDGymEnv respectively. The state observations in these environments are available in the form of RGB images and the action space is continuous, making them difficult to solve. A number of strategies are suggested to provide intermediate hindsight goals required for implementing HER algorithm on these problems which are essentially single-goal environments. In addition, a number of feature extraction architectures are proposed to incorporate spatial and temporal attention in the learning process. Through rigorous simulation experiments, the improvement achieved with these components are established. To the best of our knowledge, such a benchmarking study is not available for the above two vision-based robotics problems making it a novel contribution in the field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题