通过坐标嵌入的卷积神经网络的简单修复

论文标题

通过坐标嵌入的卷积神经网络的简单修复

A Simple Fix for Convolutional Neural Network via Coordinate Embedding

论文作者

Ren, Liliang, Hao, Zhuonan

论文摘要

卷积神经网络（CNN）已被广泛应用于计算机视觉领域。但是，鉴于CNN模型是翻译不变的事实，他们不知道每个像素的坐标信息。因此，CNN的概括能力将受到限制，因为坐标信息对于模型学习直接在每个像素的坐标上运行的仿射变换至关重要。在这个项目中，我们提出了一种简单的方法，将坐标信息通过坐标嵌入到CNN模型中。我们的方法不会改变下游模型体系结构，并且可以轻松地应用于诸如对象检测之类的任务的预训练模型。我们在德国交通符号检测基准上进行的实验表明，我们的方法不仅显着提高了模型性能，而且在仿射转化方面具有更好的鲁棒性。

Convolutional Neural Networks (CNN) has been widely applied in the realm of computer vision. However, given the fact that CNN models are translation invariant, they are not aware of the coordinate information of each pixel. Thus the generalization ability of CNN will be limited since the coordinate information is crucial for a model to learn affine transformations which directly operate on the coordinate of each pixel. In this project, we proposed a simple approach to incorporate the coordinate information to the CNN model through coordinate embedding. Our approach does not change the downstream model architecture and can be easily applied to the pre-trained models for the task like object detection. Our experiments on the German Traffic Sign Detection Benchmark show that our approach not only significantly improve the model performance but also have better robustness with respect to the affine transformation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题