论文标题

代数

AlgebraNets

论文作者

Hoffmann, Jordan, Schmitt, Simon, Osindero, Simon, Simonyan, Karen, Elsen, Erich

论文摘要

从历史上看,神经网络一直是从$ {f:\ mathbb {r}^n \ to \ mathbb {r}^m} $的$ {f:\ mathbb {f:\ mathbb {f:\ mathbb {r}^n \ to \ mathbb {r}^m} $中构建的。我们的工作考虑了一组更丰富的激活和权重对象,并通过研究其在两个具有挑战性的问题上研究其绩效:使用ImageNet数据集和语言建模使用ENWIKI8和WIKITEXT-103数据集进行了全面研究作为数字表示:大规模图像分类。我们将这种更广泛的模型表示为代数。我们的发现表明,先前工作的结论探索了$ \ mathbb {c} $(复数数字)和$ \ mathbb {h} $(quaternions)构建的神经网络,并不总是转移到这些具有挑战性的设置上。但是,我们的结果表明,与$ \ mathbb {r} $相比,有其他代数可提供更好的参数和计算效率。我们考虑$ \ mathbb {c} $,$ \ mathbb {h} $,$ m_ {2}(\ mathbb {r})$($ 2 \ times2 $ times2 $ real-valueD矩阵),$ m_ $ m_ {4}(\ Mathbb {r})$。此外,我们注意到,这些代数中的乘法比实际乘法具有更高的计算密度。因此,我们研究了如何在代数中诱导稀疏性。我们希望我们在大规模实用的基准方面的强劲成果能够进一步探索这些非常规架构,这些结构挑战了将实数用于神经网络权重和激活的默认选择。

Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i.e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$. Our work considers a richer set of objects for activations and weights, and undertakes a comprehensive study of alternative algebras as number representations by studying their performance on two challenging problems: large-scale image classification using the ImageNet dataset and language modeling using the enwiki8 and WikiText-103 datasets. We denote this broader class of models as AlgebraNets. Our findings indicate that the conclusions of prior work, which explored neural networks constructed from $\mathbb{C}$ (complex numbers) and $\mathbb{H}$ (quaternions) on smaller datasets, do not always transfer to these challenging settings. However, our results demonstrate that there are alternative algebras which deliver better parameter and computational efficiency compared with $\mathbb{R}$. We consider $\mathbb{C}$, $\mathbb{H}$, $M_{2}(\mathbb{R})$ (the set of $2\times2$ real-valued matrices), $M_{2}(\mathbb{C})$, $M_{3}(\mathbb{R})$ and $M_{4}(\mathbb{R})$. Additionally, we note that multiplication in these algebras has higher compute density than real multiplication, a useful property in situations with inherently limited parameter reuse such as auto-regressive inference and sparse neural networks. We therefore investigate how to induce sparsity within AlgebraNets. We hope that our strong results on large-scale, practical benchmarks will spur further exploration of these unconventional architectures which challenge the default choice of using real numbers for neural network weights and activations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源