论文标题
离线加强学习以多个频率学习
Offline Reinforcement Learning at Multiple Frequencies
论文作者
论文摘要
利用许多离线机器人数据来源需要努力处理此类数据的异质性。在本文中,我们关注异质性的一个特定方面:从不同控制频率收集的离线数据中学习。在整个实验室中,控制器的离散化,传感器的采样率以及对目标任务的要求可能会有所不同,从而导致聚合数据集中的频率混合在一起。我们研究离线增强学习(RL)算法如何在训练过程中使用频率混合的数据。我们观察到,$ Q $价值以不同的离散率以不同的速度传播,从而导致了离线RL的许多学习挑战。我们提出了一个简单而有效的解决方案,该解决方案可以在$ Q $值更新速率上实现一致性,以稳定学习。通过缩放$ n $ n $步骤的$ n $的价值,并具有离散化的大小,我们有效地平衡了$ q $ - 价值传播,从而导致更稳定的收敛性。在三个模拟的机器人控制问题上,我们从经验上发现,这种简单的方法的平均比幼稚的混合量超过50%。
Leveraging many sources of offline robot data requires grappling with the heterogeneity of such data. In this paper, we focus on one particular aspect of heterogeneity: learning from offline data collected at different control frequencies. Across labs, the discretization of controllers, sampling rates of sensors, and demands of a task of interest may differ, giving rise to a mixture of frequencies in an aggregated dataset. We study how well offline reinforcement learning (RL) algorithms can accommodate data with a mixture of frequencies during training. We observe that the $Q$-value propagates at different rates for different discretizations, leading to a number of learning challenges for off-the-shelf offline RL. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning. By scaling the value of $N$ in $N$-step returns with the discretization size, we effectively balance $Q$-value propagation, leading to more stable convergence. On three simulated robotic control problems, we empirically find that this simple approach outperforms naïve mixing by 50% on average.