论文标题

关于微调与元提升学习的有效性

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

论文作者

Mandi, Zhao, Abbeel, Pieter, James, Stephen

论文摘要

智能代理应该有能力利用先前学到的任务中的知识,以便快速有效地学习新任务。元学习方法已成为实现这一目标的流行解决方案。但是,到目前为止,元强化学习(META-RL)算法仅限于具有狭窄任务分布的简单环境。此外,预处理的范式随后进行了微调以适应新任务,这是一种简单而有效的解决方案,这些解决方案是监督和自我监督的学习。这引起了质疑,在加强学习方面,元学习方法的好处也通常是以高复杂性为代价的。因此,我们研究了包括Procgen,RLBench和Atari在内的各种基于视觉的基准测试中的Meta-RL方法,在这些基准测试中,对完全新颖的任务进行了评估。我们的发现表明,当对不同任务(而不是相同任务的不同变化)评估元学习方法时,对新任务进行微调的多任务预处理也相同,或者比具有元测试时间适应的元过程的元素均等或更好。这对于将来的研究令人鼓舞,因为多任务预处理往往比Meta-RL更简单和计算更便宜。从这些发现中,我们主张评估未来的元RL方法在更具挑战性的任务上,并包括以简单但强大的基线进行微调进行多任务预处理。

Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源