论文标题
元学习,大规模非平稳任务分布少忘记
Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions
论文作者
论文摘要
当许多松散相关的无标记数据可用并且稀缺标记的数据时,机器智能的范式从纯粹的监督学习转变为更实用的情况。大多数现有的算法都假定基础任务分布是固定的。在这里,我们认为随着时间的推移,该任务分布中的一个更现实,更具挑战性的环境会不断发展。我们将这个问题称为半监督的元学习,并具有不断发展的任务分布,缩写为集合。在这种更现实的环境中出现了两个关键的挑战:(i)在存在大量未标记的分布(OOD)数据的情况下,如何使用未标记的数据; (ii)如何防止由于任务分配变化而导致先前学习的任务分布的灾难性遗忘。我们提出了一种强大的知识和知识保留的半监督元学习方法(顺序),以应对这两个主要挑战。具体而言,我们的订单引入了一种新型的共同信息正则化,以使用未标记的OOD数据来鲁棒化模型,并采用最佳的传输正则化来记住以前在特征空间中学习的知识。此外,我们在一个非常具有挑战性的数据集上测试了我们的方法:大规模非平稳的半监督任务分布的集合,该任务分布由(至少)72K任务组成。通过广泛的实验,我们证明了所提出的订单减轻了忘记不断发展的任务分布,并且对OOD数据比相关的强基础更强大。
The paradigm of machine intelligence moves from purely supervised learning to a more practical scenario when many loosely related unlabeled data are available and labeled data is scarce. Most existing algorithms assume that the underlying task distribution is stationary. Here we consider a more realistic and challenging setting in that task distributions evolve over time. We name this problem as Semi-supervised meta-learning with Evolving Task diStributions, abbreviated as SETS. Two key challenges arise in this more realistic setting: (i) how to use unlabeled data in the presence of a large amount of unlabeled out-of-distribution (OOD) data; and (ii) how to prevent catastrophic forgetting on previously learned task distributions due to the task distribution shift. We propose an OOD Robust and knowleDge presErved semi-supeRvised meta-learning approach (ORDER), to tackle these two major challenges. Specifically, our ORDER introduces a novel mutual information regularization to robustify the model with unlabeled OOD data and adopts an optimal transport regularization to remember previously learned knowledge in feature space. In addition, we test our method on a very challenging dataset: SETS on large-scale non-stationary semi-supervised task distributions consisting of (at least) 72K tasks. With extensive experiments, we demonstrate the proposed ORDER alleviates forgetting on evolving task distributions and is more robust to OOD data than related strong baselines.