论文标题
嵌入式合奏:无限宽度限制和操作制度
Embedded Ensembles: Infinite Width Limit and Operating Regimes
论文作者
论文摘要
结合神经网络的一种记忆有效方法是通过单个参考网络在结合模型之间共享大多数权重。我们将此策略称为嵌入式结合(EE);它的特殊示例是批处理和蒙特卡洛辍学合奏。在本文中,我们对具有不同模型数量的嵌入式合奏进行了系统的理论和经验分析。从理论上讲,我们使用基于神经内核的方法来得出梯度下降动力学的广泛网络极限。在此限制下,我们根据集合模型的体系结构和初始化策略确定两个集合制度 - 独立和集体。我们证明,在独立的制度中,嵌入的集合作为独立模型的集合。我们通过有限的网络进行了广泛的实验来确认我们的理论预测,并进一步研究了各种效果,例如两个制度之间的过渡,集合性能与网络宽度和模型数量的缩放以及对许多体系结构和超参数选择的绩效的依赖。
A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices.