随机K-FAC：使用随机数值线性代数加速K-FAC

论文标题

随机K-FAC：使用随机数值线性代数加速K-FAC

Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear Algebra

论文作者

Puiu, Constantin Octavian

论文摘要

K-FAC是一种可以成功实施自然梯度来进行深度学习的实施，尽管如此，它仍需要计算Kronecker因子倒数的要求（通过特征分类）。当这些因素很大时，这可能非常耗时（甚至是过时的）。在本文中，我们从理论上表明，由于通常使用的Kronecker因子的指数平均构造范式，其特征光谱必须衰减。我们从数字上表明，在实践中，这种衰减非常迅速，这是因为在反转Kronecker-factor时仅关注前几个特征模式，我们可以节省实质性计算。重要的是，频谱衰减在层宽度的恒定模式下发生。这使我们能够将K-FAC的时间复杂性从立方宽度宽度缩小到二次宽度，部分截止了GAP W.R.T. Seng（另一种实用的自然梯度实施，用于深度学习，宽度线性缩放）。随机数值线性代数为我们提供了这样做的必要工具。数值结果表明，我们获得$ \ 2.5 \ times $缩短每个上，$ \ of $ \ of 3.3 \ times $缩减目标准确性的时间。我们比较了我们提出的k-fac加速版本Seng，并观察到，对于使用VGG16_BN进行CIFAR10分类，我们可以与之相当。

K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very time-consuming (or even prohibitive) when these factors are large. In this paper, we theoretically show that, owing to the exponential-average construction paradigm of the Kronecker factors that is typically used, their eigen-spectrum must decay. We show numerically that in practice this decay is very rapid, leading to the idea that we could save substantial computation by only focusing on the first few eigen-modes when inverting the Kronecker-factors. Importantly, the spectrum decay happens over a constant number of modes irrespectively of the layer width. This allows us to reduce the time complexity of K-FAC from cubic to quadratic in layer width, partially closing the gap w.r.t. SENG (another practical Natural Gradient implementation for Deep learning which scales linearly in width). Randomized Numerical Linear Algebra provides us with the necessary tools to do so. Numerical results show we obtain $\approx2.5\times$ reduction in per-epoch time and $\approx3.3\times$ reduction in time to target accuracy. We compare our proposed K-FAC sped-up versions SENG, and observe that for CIFAR10 classification with VGG16_bn we perform on par with it.

下载PDF全文

下载文献需遵守相关版权规定

论文标题