论文标题
超网 - 神经网络的有效方法
SuperNet -- An efficient method of neural networks ensembling
论文作者
论文摘要
神经网络结合的主要缺陷是,它在计算上非常苛刻,特别是如果单个子模型是大型神经网络,必须单独训练。考虑到现代DNN可以非常准确,它们已经是简单分类器的巨大合奏,并且可以为任何合奏构建更具节俭的压缩神经网络,因此设计昂贵的超网的想法可能会值得怀疑。人们普遍认为,结合会增加预测时间,使其不吸引人,并且可能是ML研究的主流旨在针对更高级和有效的神经网络制定更好的损失功能和学习策略的原因。另一方面,所有这些因素都使体系结构更加复杂,这可能导致过度拟合和高计算复杂性,即与高度参数化的超级网络集合所归咎于相同的缺陷。主论文的目标是加快整体生成所需的执行时间。它们没有训练k不准确的子模型,而是每个DNN的训练的各个阶段(代表损失函数的各种局部最小值)[Huang等,2017; Gripov等,2018]。因此,超级网的计算性能可以与训练其单个子模型所花费的最大CPU时间相提并论,加上训练超网耦合因子所需的CPU时间通常要短得多。
The main flaw of neural network ensembling is that it is exceptionally demanding computationally, especially, if the individual sub-models are large neural networks, which must be trained separately. Having in mind that modern DNNs can be very accurate, they are already the huge ensembles of simple classifiers, and that one can construct more thrifty compressed neural net of a similar performance for any ensemble, the idea of designing the expensive SuperNets can be questionable. The widespread belief that ensembling increases the prediction time, makes it not attractive and can be the reason that the main stream of ML research is directed towards developing better loss functions and learning strategies for more advanced and efficient neural networks. On the other hand, all these factors make the architectures more complex what may lead to overfitting and high computational complexity, that is, to the same flaws for which the highly parametrized SuperNets ensembles are blamed. The goal of the master thesis is to speed up the execution time required for ensemble generation. Instead of training K inaccurate sub-models, each of them can represent various phases of training (representing various local minima of the loss function) of a single DNN [Huang et al., 2017; Gripov et al., 2018]. Thus, the computational performance of the SuperNet can be comparable to the maximum CPU time spent on training its single sub-model, plus usually much shorter CPU time required for training the SuperNet coupling factors.