了解空间变压器网络何时不支持不变性以及该怎么做

论文标题

了解空间变压器网络何时不支持不变性以及该怎么做

Understanding when spatial transformer networks do not support invariance, and what to do about it

论文作者

Finnveden, Lukas, Jansson, Ylva, Lindeberg, Tony

论文摘要

设计空间变压器网络（STN）是为了使卷积神经网络（CNN）能够学习不变性以形象变换。最初提出了STN来改变CNN特征图和输入图像。这可以在预测转换参数时使用更复杂的功能。但是，由于STN执行纯粹的空间转换，因此，在一般情况下，它们无法将转换图像的特征图与原始图像相结合。因此，在转换CNN特征图时，STN无法支持不变性。我们为此提供了一个简单的证据，并研究了实际的含义，表明这种无能与分类精度降低相结合。因此，我们研究了使用复杂功能的替代性STN体系结构。我们发现，尽管很难训练更深层次的本地化网络，但是随着它们的增长，与分类网络共享参数的本地化网络保持稳定，这使得在困难数据集上具有更高的分类精度。最后，我们探讨了本地化网络复杂性与迭代图像对齐之间的相互作用。

Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题