REVBIFPN：完全可逆的双向特征金字塔网络

论文标题

REVBIFPN：完全可逆的双向特征金字塔网络

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

论文作者

Chiley, Vitaliy, Thangarasa, Vithursan, Gupta, Abhay, Samar, Anshul, Hestness, Joel, DeCoste, Dennis

论文摘要

这项工作介绍了Revsilo，这是第一个可逆的双向多尺度特征融合模块。像其他可逆方法一样，Revsilo消除了通过重新计算来存储隐藏激活的需求。但是，现有的可逆方法不适用于多尺度功能融合，因此不适用于大型网络。双向多尺度特征融合促进了局部和全局的连贯性，并已成为针对空间敏感任务的网络的事实上的设计原理，例如HRNet（Sun等，2019a）和有效的DET（Tan等，2020）。当与高分辨率输入配对时，这些网络在各种计算机视觉任务中实现了最新的结果。但是，训练它们需要大量的加速器内存，以节省大型的多分辨率激活。这些内存要求本质地限制了神经网络的大小，从而限制了来自尺度的改进。 Revsilo在分辨率范围内运行，减轻了这些问题。堆叠Revsilos，我们创建了RevBiFPN，这是一个完全可逆的双向功能金字塔网络。 RevBIFPN与诸如Extricnet之类的网络具有竞争力，同时消耗多达19.8倍的训练记忆进行图像分类。当对MS Coco进行微调时，RevBIFPN使用较少的MAC和训练时间内存降低了2.4倍的AP提高AP的2.5％。

This work introduces RevSilo, the first reversible bidirectional multi-scale feature fusion module. Like other reversible methods, RevSilo eliminates the need to store hidden activations by recomputing them. However, existing reversible methods do not apply to multi-scale feature fusion and are, therefore, not applicable to a large class of networks. Bidirectional multi-scale feature fusion promotes local and global coherence and has become a de facto design principle for networks targeting spatially sensitive tasks, e.g., HRNet (Sun et al., 2019a) and EfficientDet (Tan et al., 2020). These networks achieve state-of-the-art results across various computer vision tasks when paired with high-resolution inputs. However, training them requires substantial accelerator memory for saving large, multi-resolution activations. These memory requirements inherently cap the size of neural networks, limiting improvements that come from scale. Operating across resolution scales, RevSilo alleviates these issues. Stacking RevSilos, we create RevBiFPN, a fully reversible bidirectional feature pyramid network. RevBiFPN is competitive with networks such as EfficientNet while using up to 19.8x lesser training memory for image classification. When fine-tuned on MS COCO, RevBiFPN provides up to a 2.5% boost in AP over HRNet using fewer MACs and a 2.4x reduction in training-time memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题