论文标题
在整体3D RERAM Crossbar中有效实施多通道卷积
Efficient Implementation of Multi-Channel Convolution in Monolithic 3D ReRAM Crossbar
论文作者
论文摘要
卷积神经网络(CNN)在广泛的应用中表现出有希望的准确性。在CNN中的所有层中,卷积层是最密集的,并且消耗最多的能量。作为设备和制造技术的成熟度,3D电阻随机访问记忆(RERAM)由于其高平行性和能源效率的好处而引起了大型矢量矩阵乘法和卷积,因此受到了极大的关注。但是,在3D RERAM中天真地实施多通道卷积将产生不正确的结果,或仅利用3D重新兰异的部分并行性。在本文中,我们提出了一个基于3D RERAM的卷积加速器架构,该结构有效地将多通道卷积映射到单片3D RERAM。我们的设计有两个关键原则。首先,我们利用3D RERAM的交织结构通过使用最先进的卷积算法来实现多渠道卷积。其次,我们提出了一种新方法,通过使用可配置的互连将其与非负权重分开,以有效地实现负重。我们的评估表明,与自定义的2D RERAM基线和先进的CPU和GPU相比,我们的16层3D RERAM中的映射方案的加速度为5.79倍,927.81X和36.8倍。与同一基线相比,我们的设计还将能耗降低2.12倍,1802.64倍和114.1倍。
Convolutional neural networks (CNNs) demonstrate promising accuracy in a wide range of applications. Among all layers in CNNs, convolution layers are the most computation-intensive and consume the most energy. As the maturity of device and fabrication technology, 3D resistive random access memory (ReRAM) receives substantial attention for accelerating large vector-matrix multiplication and convolution due to its high parallelism and energy efficiency benefits. However, implementing multi-channel convolution naively in 3D ReRAM will either produce incorrect results or exploit only partial parallelism of 3D ReRAM. In this paper, we propose a 3D ReRAM-based convolution accelerator architecture, which efficiently maps multi-channel convolution to monolithic 3D ReRAM. Our design has two key principles. First, we exploit the intertwined structure of 3D ReRAM to implement multi-channel convolution by using a state-of-the-art convolution algorithm. Second, we propose a new approach to efficiently implement negative weights by separating them from non-negative weights using configurable interconnects. Our evaluation demonstrates that our mapping scheme in 16-layer 3D ReRAM achieves a speedup of 5.79X, 927.81X, and 36.8X compared with a custom 2D ReRAM baseline and state-of-the-art CPU and GPU. Our design also reduces energy consumption by 2.12X, 1802.64X, and 114.1X compared with the same baseline.