CodEnet：在嵌入式FPGA上有效部署输入自适应对象检测

论文标题

CodEnet：在嵌入式FPGA上有效部署输入自适应对象检测

CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

论文作者

Dong, Zhen, Wang, Dequan, Huang, Qijing, Gao, Yizhao, Cai, Yaohui, Li, Tian, Wu, Bichen, Keutzer, Kurt, Wawrzynek, John

论文摘要

由于计算资源有限，在嵌入式系统上部署深度学习模型一直在挑战。现有的大多数工作都集中在加速图像分类上，而其他基本视觉问题（例如对象检测）尚未得到充分解决。与图像分类相比，检测问题对物体的空间差异更为敏感，因此需要专门的卷积才能汇总空间信息。为了满足这一需求，最近的工作引入了动态变形卷积以增加常规卷积。但是，这将导致使用现有硬件的输入的效率低下的内存访问。在这项工作中，我们利用FPGA的灵活性开发具有可变形卷积的新型对象检测管道。我们显示了一系列算法修改的速度准确性权衡，包括不规则访问与有限范围和固定形状。然后，我们将网络代码网与修改的变形卷积共同设计，并将其量化为4位权重和8位激活。通过我们的高效率实施，我们的解决方案每秒达到26.9帧的尺寸为0.76 MB，同时在标准对象检测数据集（Pascal VOC）上达到61.7 AP50。通过我们更高的准确性实现，我们的模型在Pascal VOC上达到67.1 AP50，只有2.9 MB的参数-20.9倍-20.9倍，但比Tiny-Yolo高10％。

Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. However, this will lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape. We then Co-Design a Network CoDeNet with the modified deformable convolution and quantize it to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题