离线加强学习以多个频率学习

论文标题

离线加强学习以多个频率学习

Offline Reinforcement Learning at Multiple Frequencies

论文作者

Burns, Kaylee, Yu, Tianhe, Finn, Chelsea, Hausman, Karol

论文摘要

利用许多离线机器人数据来源需要努力处理此类数据的异质性。在本文中，我们关注异质性的一个特定方面：从不同控制频率收集的离线数据中学习。在整个实验室中，控制器的离散化，传感器的采样率以及对目标任务的要求可能会有所不同，从而导致聚合数据集中的频率混合在一起。我们研究离线增强学习（RL）算法如何在训练过程中使用频率混合的数据。我们观察到，$ Q $价值以不同的离散率以不同的速度传播，从而导致了离线RL的许多学习挑战。我们提出了一个简单而有效的解决方案，该解决方案可以在$ Q $值更新速率上实现一致性，以稳定学习。通过缩放$ n $ n $步骤的$ n $的价值，并具有离散化的大小，我们有效地平衡了$ q $ - 价值传播，从而导致更稳定的收敛性。在三个模拟的机器人控制问题上，我们从经验上发现，这种简单的方法的平均比幼稚的混合量超过50％。

Leveraging many sources of offline robot data requires grappling with the heterogeneity of such data. In this paper, we focus on one particular aspect of heterogeneity: learning from offline data collected at different control frequencies. Across labs, the discretization of controllers, sampling rates of sensors, and demands of a task of interest may differ, giving rise to a mixture of frequencies in an aggregated dataset. We study how well offline reinforcement learning (RL) algorithms can accommodate data with a mixture of frequencies during training. We observe that the $Q$-value propagates at different rates for different discretizations, leading to a number of learning challenges for off-the-shelf offline RL. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning. By scaling the value of $N$ in $N$-step returns with the discretization size, we effectively balance $Q$-value propagation, leading to more stable convergence. On three simulated robotic control problems, we empirically find that this simple approach outperforms naïve mixing by 50% on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题