关于深线性神经网络中的对齐

论文标题

关于深线性神经网络中的对齐

On Alignment in Deep Linear Neural Networks

论文作者

Radhakrishnan, Adityanarayanan, Nichani, Eshaan, Bernstein, Daniel, Uhler, Caroline

论文摘要

我们研究对齐的特性，一种隐式正则化的形式，在梯度下降下的线性神经网络中。我们为具有多维输出的完全连接的网络定义对齐，并表明它是Ji and Telgarsky（2018年）定义的具有一维输出的网络中对齐的自然扩展。尽管在完全连接的网络中，在完全连接的网络中，始终存在与一个对齐的解决方案相通用的全球最小值，我们分析了与培训相关的培训过程。也就是说，我们表征了何时对齐是通过提供这种不变的必要条件来保持梯度下降下训练的训练。在这种情况下，简化了梯度下降的动力学，从而使我们能够提供明确的学习率，在该学习率下，网络将线性收敛到全局最小值。然后，我们分析具有层限制（例如卷积网络）的网络。在这种情况下，我们证明梯度下降等效于投影梯度下降，并且对于足够大的数据集，对齐是不可能的。

We study the properties of alignment, a form of implicit regularization, in linear neural networks under gradient descent. We define alignment for fully connected networks with multidimensional outputs and show that it is a natural extension of alignment in networks with 1-dimensional outputs as defined by Ji and Telgarsky, 2018. While in fully connected networks, there always exists a global minimum corresponding to an aligned solution, we analyze alignment as it relates to the training process. Namely, we characterize when alignment is an invariant of training under gradient descent by providing necessary and sufficient conditions for this invariant to hold. In such settings, the dynamics of gradient descent simplify, thereby allowing us to provide an explicit learning rate under which the network converges linearly to a global minimum. We then analyze networks with layer constraints such as convolutional networks. In this setting, we prove that gradient descent is equivalent to projected gradient descent, and that alignment is impossible with sufficiently large datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题