论文标题
关于自然梯度的深度学习
On the Locality of the Natural Gradient for Deep Learning
论文作者
论文摘要
我们研究了包括神经网络在内的深贝叶斯网络中学习的自然梯度方法。与此类学习系统相关的两个天然几何形状,包括可见单位和隐藏单元。一个几何形状与完整系统有关,另一个几何与可见的子系统有关。这两种几何表示不同的自然梯度。第一步,由于Fisher Information矩阵的局部性能,我们证明了对第一几何的自然梯度的大量简化。相对于第二几何,这种简化并不能直接转化为相应的简化。我们开发了研究自然梯度两个版本之间关系的理论,并概述了基于第一个几何的第二几何形状来简化自然梯度的方法。该方法建议将识别模型合并为一种辅助模型,以有效地应用自然梯度方法在深网络中。
We study the natural gradient method for learning in deep Bayesian networks, including neural networks. There are two natural geometries associated with such learning systems consisting of visible and hidden units. One geometry is related to the full system, the other one to the visible sub-system. These two geometries imply different natural gradients. In a first step, we demonstrate a great simplification of the natural gradient with respect to the first geometry, due to locality properties of the Fisher information matrix. This simplification does not directly translate to a corresponding simplification with respect to the second geometry. We develop the theory for studying the relation between the two versions of the natural gradient and outline a method for the simplification of the natural gradient with respect to the second geometry based on the first one. This method suggests to incorporate a recognition model as an auxiliary model for the efficient application of the natural gradient method in deep networks.