论文标题
深度平衡网络对初始化统计敏感
Deep equilibrium networks are sensitive to initialization statistics
论文作者
论文摘要
深度平衡网络(DEQ)是构建模型的有前途的方式,这些模型将记忆交换为计算。但是,与传统网络相比,对这些模型的理论理解仍然缺乏,部分原因是一组重量的重复应用。我们表明,DEQ对初始化的基质家族的高阶统计数据很敏感。特别是,用正交或对称矩阵初始化可以在训练中更高的稳定性。这为我们提供了初始化的实用处方,该处方允许以更广泛的初始重量量表进行训练。
Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.