论文标题

硬件 - 刺激性内计数以进行对象检测

Hardware-Robust In-RRAM-Computing for Object Detection

论文作者

Chiang, Yu-Hsiang, Ni, Cheng En, Sung, Yun, Hou, Tuo-Hung, Chang, Tian-Sheuan, Jou, Shyh Jye

论文摘要

内存计算由于其高度平行的计算,低功率和低面积成本而成为深度学习硬件加速器的流行架构。但是,刺激内计算(IRC)在硬件中遭受了较大的设备变化和许多非理想效应。尽管以前的方法在模型训练中成功提高了变异耐受性,但它们仅考虑了非理想效应和相对简单的分类任务的一部分。本文提出了一种联合硬件和软件优化策略,以设计用于对象检测的硬件刺激性IRC宏。我们通过使用低词线电压来降低单元电流,以在一个操作中实现完整的卷积计算,从而最大程度地减少非线性添加的影响。我们还实施了三元重量映射并删除批处理标准化,以提高对设备变化,感官放大器变化和IR下降问题的更好公差。包括额外的偏见以克服当前感应范围的限制。所提出的方法已成功应用于仅3.85 \%地图下降的复杂对象检测任务,而天真的设计在这些非理想效应下遭受了灾难性的失败。

In-memory computing is becoming a popular architecture for deep-learning hardware accelerators recently due to its highly parallel computing, low power, and low area cost. However, in-RRAM computing (IRC) suffered from large device variation and numerous nonideal effects in hardware. Although previous approaches including these effects in model training successfully improved variation tolerance, they only considered part of the nonideal effects and relatively simple classification tasks. This paper proposes a joint hardware and software optimization strategy to design a hardware-robust IRC macro for object detection. We lower the cell current by using a low word-line voltage to enable a complete convolution calculation in one operation that minimizes the impact of nonlinear addition. We also implement ternary weight mapping and remove batch normalization for better tolerance against device variation, sense amplifier variation, and IR drop problem. An extra bias is included to overcome the limitation of the current sensing range. The proposed approach has been successfully applied to a complex object detection task with only 3.85\% mAP drop, whereas a naive design suffers catastrophic failure under these nonideal effects.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源