关于微调与元提升学习的有效性

论文标题

关于微调与元提升学习的有效性

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

论文作者

Mandi, Zhao, Abbeel, Pieter, James, Stephen

论文摘要

智能代理应该有能力利用先前学到的任务中的知识，以便快速有效地学习新任务。元学习方法已成为实现这一目标的流行解决方案。但是，到目前为止，元强化学习（META-RL）算法仅限于具有狭窄任务分布的简单环境。此外，预处理的范式随后进行了微调以适应新任务，这是一种简单而有效的解决方案，这些解决方案是监督和自我监督的学习。这引起了质疑，在加强学习方面，元学习方法的好处也通常是以高复杂性为代价的。因此，我们研究了包括Procgen，RLBench和Atari在内的各种基于视觉的基准测试中的Meta-RL方法，在这些基准测试中，对完全新颖的任务进行了评估。我们的发现表明，当对不同任务（而不是相同任务的不同变化）评估元学习方法时，对新任务进行微调的多任务预处理也相同，或者比具有元测试时间适应的元过程的元素均等或更好。这对于将来的研究令人鼓舞，因为多任务预处理往往比Meta-RL更简单和计算更便宜。从这些发现中，我们主张评估未来的元RL方法在更具挑战性的任务上，并包括以简单但强大的基线进行微调进行多任务预处理。

Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题