可重组的激活网络

论文标题

可重组的激活网络

Restructurable Activation Networks

论文作者

Bhardwaj, Kartikeya, Ward, James, Tung, Caleb, Gope, Dibakar, Meng, Lingchuan, Fedorov, Igor, Chalfin, Alex, Whatmough, Paul, Loh, Danny

论文摘要

是否可以在深网络中重组非线性激活函数以创建硬件有效的模型？为了解决这个问题，我们提出了一种称为重组激活网络（RANS）的新范式，该范式操纵模型中的非线性数量以提高其硬件意识和效率。首先，我们提出了RAN-STEMPLICT（RAN-E） - 新的硬件感知搜索空间和半自动搜索算法 - 用硬件感知的块替换效率低下的块。接下来，我们提出了一种称为RAN-IMPLICIC（RAN-I）的无训练模型缩放方法，从理论上讲，我们在非线性单元数量方面证明了网络拓扑及其表现性之间的联系。我们证明，我们的网络在不同尺度和几种类型的硬件上实现最新的成像网结果。例如，与有效网络-lite-B0相比，RAN-E在ARM Micro-NPU上每秒提高了1.5倍，同时提高了类似的精度，同时将帧数（FPS）提高1.5倍。另一方面，ran-i以相似或更好的精度表现出#macs的#macs降低2倍。我们还表明，在基于ARM的数据中心CPU上，RAN-I的FPS比Convnext高40％。最后，与基于Convnext的模型相比，基于RAN-I的对象检测网络在数据中心CPU上获得了相似或更高的映射，并且在数据中心CPU上的fps高达33％。可以在https://github.com/arm-software/ml-restructurable-activation-networks上获得训练和评估RANS和评估RANS的代码。

Is it possible to restructure the non-linear activation functions in a deep network to create hardware-efficient models? To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount of non-linearity in models to improve their hardware-awareness and efficiency. First, we propose RAN-explicit (RAN-e) -- a new hardware-aware search space and a semi-automatic search algorithm -- to replace inefficient blocks with hardware-aware blocks. Next, we propose a training-free model scaling method called RAN-implicit (RAN-i) where we theoretically prove the link between network topology and its expressivity in terms of number of non-linear units. We demonstrate that our networks achieve state-of-the-art results on ImageNet at different scales and for several types of hardware. For example, compared to EfficientNet-Lite-B0, RAN-e achieves a similar accuracy while improving Frames-Per-Second (FPS) by 1.5x on Arm micro-NPUs. On the other hand, RAN-i demonstrates up to 2x reduction in #MACs over ConvNexts with a similar or better accuracy. We also show that RAN-i achieves nearly 40% higher FPS than ConvNext on Arm-based datacenter CPUs. Finally, RAN-i based object detection networks achieve a similar or higher mAP and up to 33% higher FPS on datacenter CPUs compared to ConvNext based models. The code to train and evaluate RANs and the pretrained networks are available at https://github.com/ARM-software/ML-restructurable-activation-networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题