探索DNN的学习动力学的层调节分析

论文标题

探索DNN的学习动力学的层调节分析

Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs

论文作者

Huang, Lei, Qin, Jie, Liu, Li, Zhu, Fan, Shao, Ling

论文摘要

调节分析通过探索其曲率矩阵的光谱来揭示优化目标的景观。对于线性模型，理论上对此进行了很好的探索。我们将此分析扩展到深度神经网络（DNN），以研究其学习动力。为此，我们提出了层面条件分析，该分析探讨了独立的优化格局。理论上在大约在实践中存在的温和假设下对这种分析得到了支持。基于我们的分析，我们表明批归归术（BN）可以稳定训练，但有时会导致局部最低限度的错误印象，这对学习有害。此外，我们从实验上观察到BN可以改善优化问题的层调节。最后，我们发现非常深的残留网络的最后一个线性层显示出不良条件的行为。我们仅在最后一个线性层之前仅添加一个BN层来解决这个问题，这比原始和预激活残留网络的性能提高了。

Conditioning analysis uncovers the landscape of an optimization objective by exploring the spectrum of its curvature matrix. This has been well explored theoretically for linear models. We extend this analysis to deep neural networks (DNNs) in order to investigate their learning dynamics. To this end, we propose layer-wise conditioning analysis, which explores the optimization landscape with respect to each layer independently. Such an analysis is theoretically supported under mild assumptions that approximately hold in practice. Based on our analysis, we show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum, which has detrimental effects on the learning. Besides, we experimentally observe that BN can improve the layer-wise conditioning of the optimization problem. Finally, we find that the last linear layer of a very deep residual network displays ill-conditioned behavior. We solve this problem by only adding one BN layer before the last linear layer, which achieves improved performance over the original and pre-activation residual networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题