论文标题
光电微芯片上的张量代数
Tensor Algebra on an Optoelectronic Microchip
论文作者
论文摘要
张量代数位于计算科学和机器学习的核心。由于其使用量很高,因此存在整个图书馆,致力于提高其性能。常规的张量代数性能促进了算法优化的重点,从而导致逐步改进。在本文中,我们描述了一种加速张量代数的方法,以不同的方式:将操作外包到光学微芯片。我们概述了开发的数值编程语言,该语言旨在执行张量代数计算,该计算旨在利用我们的光学硬件的全部潜力。我们介绍了该语言的当前语法,并介绍编译器设计。然后,我们展示了一种在RAM中存储稀疏rank-n张量的新方法,该方法的表现优于常规数组存储(C ++,Java等)。该方法比压缩的稀疏光纤(CSF)格式更有效率,并且专门针对我们的光学硬件进行了调整。最后,我们展示了标量张量产品,排名-N $ Kronecker产品,张量点产品,Khatri-Rao产品,面部分解产品和矢量跨产品如何通过各种张量分解来汇编为我们光学微芯片的原产质。
Tensor algebra lies at the core of computational science and machine learning. Due to its high usage, entire libraries exist dedicated to improving its performance. Conventional tensor algebra performance boosts focus on algorithmic optimizations, which in turn lead to incremental improvements. In this paper, we describe a method to accelerate tensor algebra a different way: by outsourcing operations to an optical microchip. We outline a numerical programming language developed to perform tensor algebra computations that is designed to leverage our optical hardware's full potential. We introduce the language's current grammar and go over the compiler design. We then show a new way to store sparse rank-n tensors in RAM that outperforms conventional array storage (used by C++, Java, etc.). This method is more memory-efficient than Compressed Sparse Fiber (CSF) format and is specifically tuned for our optical hardware. Finally, we show how the scalar-tensor product, rank-$n$ Kronecker product, tensor dot product, Khatri-Rao product, face-splitting product, and vector cross product can be compiled into operations native to our optical microchip through various tensor decompositions.