旋转：具有结构的内部偏移网络，用于场景文本识别

论文标题

旋转：具有结构的内部偏移网络，用于场景文本识别

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

论文作者

Zhang, Chengwei, Xu, Yunlu, Cheng, Zhanzhan, Pu, Shiliang, Niu, Yi, Wu, Fei, Zou, Futai

论文摘要

任意文字外观在场景文本识别任务中构成了巨大的挑战。现有作品主要处理形状失真的问题，包括透视扭曲，线曲率或其他样式变化。因此，广泛研究了基于空间变压器的方法。但是，复杂场景中的色彩困难并未引起太多关注。在这项工作中，我们引入了一个新的可学习几何无关的模块，即结构的内部偏移网络（SPIN），该模块允许对网络中的源数据进行颜色操纵。可以在任何识别体系结构之前插入此可区分的模块，以简化下游任务，使神经网络能够积极转换输入强度而不是现有的空间整流。它还可以作为已知空间转换的补充模块，并以独立和协作方式与他们一起工作。广泛的实验表明，与最先进的旋转基准相比，自旋的使用会显着改善多个文本识别基准。

Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable geometric-unrelated module, the Structure-Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network. This differentiable module can be inserted before any recognition architecture to ease the downstream tasks, giving neural networks the ability to actively transform input intensity rather than the existing spatial rectification. It can also serve as a complementary module to known spatial transformations and work in both independent and collaborative ways with them. Extensive experiments show that the use of SPIN results in a significant improvement on multiple text recognition benchmarks compared to the state-of-the-arts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题