多任务和持续学习中的线性模式连接性

论文标题

多任务和持续学习中的线性模式连接性

Linear Mode Connectivity in Multitask and Continual Learning

论文作者

Mirzadeh, Seyed Iman, Farajtabar, Mehrdad, Gorur, Dilan, Pascanu, Razvan, Ghasemzadeh, Hassan

论文摘要

持续的（顺序）训练和多任务（同时）培训通常试图解决相同的整体目标：找到在所有被考虑的任务上均能表现良好的解决方案。主要区别在于培训方案，其中持续学习只能一次访问一个任务，这通常会导致灾难性遗忘。也就是说，为后续任务找到的解决方案在以前的任务上不再良好。但是，两个训练制度的不同最小值之间的关系尚不清楚。是什么让他们与众不同？是否有一个本地结构可以解释两种不同方案所达到的性能差异？最近的工作激励表明，同一任务的不同最小值通常是通过非常简单的低误差曲线连接的，我们研究了多任务和连续解决方案是否相似地连接。我们从经验上发现，确实可以可靠地实现这种连通性，更有趣的是，它可以通过线性路径来完成，以两者的同一初始化为条件。我们彻底分析了这一观察结果，并讨论了其对持续学习过程的重要性。此外，我们利用这一发现提出了一种有效的算法，该算法将依次学习的最小值限制为多任务解决方案。我们表明，我们的方法的表现优于各种视觉基准的几种不断学习算法。

Continual (sequential) training and multitask (simultaneous) training are often attempting to solve the same overall objective: to find a solution that performs well on all considered tasks. The main difference is in the training regimes, where continual learning can only have access to one task at a time, which for neural networks typically leads to catastrophic forgetting. That is, the solution found for a subsequent task does not perform well on the previous ones anymore. However, the relationship between the different minima that the two training regimes arrive at is not well understood. What sets them apart? Is there a local structure that could explain the difference in performance achieved by the two different schemes? Motivated by recent work showing that different minima of the same task are typically connected by very simple curves of low error, we investigate whether multitask and continual solutions are similarly connected. We empirically find that indeed such connectivity can be reliably achieved and, more interestingly, it can be done by a linear path, conditioned on having the same initialization for both. We thoroughly analyze this observation and discuss its significance for the continual learning process. Furthermore, we exploit this finding to propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution. We show that our method outperforms several state of the art continual learning algorithms on various vision benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题