自我监督的学习是否真的改善了从像素的强化学习？

论文标题

自我监督的学习是否真的改善了从像素的强化学习？

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?

论文作者

Li, Xiang, Shang, Jinghuan, Das, Srijan, Ryoo, Michael S.

论文摘要

我们研究自我监督学习（SSL）是否可以从像素中改善在线增强学习（RL）。我们扩展了对比度增强学习框架（例如卷曲），该框架共同优化了SSL和RL损失，并进行了大量的实验，并具有各种自我监督的损失。我们的观察结果表明，现有的RL的SSL框架无法在使用相同数量的数据和增强时利用图像扩展来对基准进行有意义的改进。我们进一步执行进化搜索，以找到RL的多个自我监督损失的最佳组合，但是发现即使是这样的损失组合也无法有意义地优于仅利用精心设计的图像增强的方法。在在包括现实世界的机器人环境在内的多个不同环境中一起评估这些方法后，我们确认没有任何单一的自我监管损耗或图像增强方法可以主导所有环境，并且SSL和RL的关节优化的当前框架是有限的。最后，我们就多种因素进行了消融研究，并证明了使用不同方法学到的表示的特性。

We investigate whether self-supervised learning (SSL) can improve online reinforcement learning (RL) from pixels. We extend the contrastive reinforcement learning framework (e.g., CURL) that jointly optimizes SSL and RL losses and conduct an extensive amount of experiments with various self-supervised losses. Our observations suggest that the existing SSL framework for RL fails to bring meaningful improvement over the baselines only taking advantage of image augmentation when the same amount of data and augmentation is used. We further perform evolutionary searches to find the optimal combination of multiple self-supervised losses for RL, but find that even such a loss combination fails to meaningfully outperform the methods that only utilize carefully designed image augmentations. After evaluating these approaches together in multiple different environments including a real-world robot environment, we confirm that no single self-supervised loss or image augmentation method can dominate all environments and that the current framework for joint optimization of SSL and RL is limited. Finally, we conduct the ablation study on multiple factors and demonstrate the properties of representations learned with different approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题