CWY参数化：正交矩阵的并行优化的解决方案

论文标题

CWY参数化：正交矩阵的并行优化的解决方案

CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

论文作者

Likhosherstov, Valerii, Davis, Jared, Choromanski, Krzysztof, Weller, Adrian

论文摘要

我们在高度平行的计算单元（例如GPU或TPU）上引入了一种对正交组优化的有效方法。与较早的工作一样，我们将正交矩阵作为家庭反射的产物进行参数。但是，为了依次克服计算住户反射的低平行化功能，我们建议采用一种称为紧凑型WY（或CWY）变换的累积方案 - 一种紧凑的平行友好型矩阵表示，以用于一系列的住户反射。我们进一步开发了一种新型的截短的CWY（或T-CWY）方法，以实现Stiefel歧管参数化，该方法具有竞争性的复杂性，并且在GPU和TPU上计算时再次产生好处。我们证明，当与随机梯度下降耦合时，我们的CWY和T-CWY方法会导致训练目标的固定点的收敛。我们将方法应用在神经机器翻译和视频预测任务中训练经常性的神经网络体系结构。

We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs. As in earlier work, we parametrize an orthogonal matrix as a product of Householder reflections. However, to overcome low parallelization capabilities of computing Householder reflections sequentially, we propose employing an accumulation scheme called the compact WY (or CWY) transform -- a compact parallelization-friendly matrix representation for the series of Householder reflections. We further develop a novel Truncated CWY (or T-CWY) approach for Stiefel manifold parametrization which has a competitive complexity and, again, yields benefits when computed on GPUs and TPUs. We prove that our CWY and T-CWY methods lead to convergence to a stationary point of the training objective when coupled with stochastic gradient descent. We apply our methods to train recurrent neural network architectures in the tasks of neural machine translation and video prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题