通过自相关的动作进行加固学习的框架

论文标题

通过自相关的动作进行加固学习的框架

A framework for reinforcement learning with autocorrelated actions

论文作者

Szulc, Marcin, Łyskawa, Jakub, Wawrzyński, Paweł

论文摘要

本文的主题是加强学习。这里考虑了基于状态和随后的时间自动相关的随机元素产生动作的策略。因此，代理商从随着时间的推移分发的实验中学习，并有可能为改进政策提供更好的线索。此外，此类政策的物理实施，例如在机器人技术中，避免使机器人摇晃，这是较少的问题。这与大多数RL算法相反，这些算法增加了白噪声以控制引起机器人不必要的摇动。这里引入了一种算法，该算法大致优化了上述策略。针对其他三种方法（PPO，SAC，ACER），针对四个模拟学习控制问题（ANT，HalfCheetah，Hopper和Walker2d）验证了其效率。在其中三个问题中，该算法优于其他算法。

The subject of this paper is reinforcement learning. Policies are considered here that produce actions based on states and random elements autocorrelated in subsequent time instants. Consequently, an agent learns from experiments that are distributed over time and potentially give better clues to policy improvement. Also, physical implementation of such policies, e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition to most RL algorithms which add white noise to control causing unwanted shaking of the robots. An algorithm is introduced here that approximately optimizes the aforementioned policy. Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms others in three of these problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题