探索场景文本识别的非字体独立功能

论文标题

探索场景文本识别的非字体独立功能

Exploring Font-independent Features for Scene Text Recognition

论文作者

Wang, Yizhi, Lian, Zhouhui

论文摘要

现场文本识别（STR）在过去几年中进行了广泛的研究。许多最近提供的方法是专门设计的，旨在适应场景文本的任意形状，布局和方向，但是忽略了各种字体（或写作）样式也对STR构成了严重的挑战。这些方法（字体的字体和内容特征都纠结在一起），在带有新颖字体样式文本的场景图像上的文本识别中表现不佳。为了解决这个问题，我们通过大量字体样式的角色产生的字形探索场景文本的独立特征。具体而言，我们引入了可训练的字体嵌入，以塑造生成的字形的字体样式，场景文本的图像特征仅代表其基本模式。生成过程是由空间注意机制指导的，该机制有效地与不规则的文本相比，与现有的图像到图像翻译方法相比，该过程产生了更高的字形。在几个基准测试基准上进行的实验证明了我们方法的优越性与艺术状态相比。

Scene text recognition (STR) has been extensively studied in last few years. Many recently-proposed methods are specially designed to accommodate the arbitrary shape, layout and orientation of scene texts, but ignoring that various font (or writing) styles also pose severe challenges to STR. These methods, where font features and content features of characters are tangled, perform poorly in text recognition on scene images with texts in novel font styles. To address this problem, we explore font-independent features of scene texts via attentional generation of glyphs in a large number of font styles. Specifically, we introduce trainable font embeddings to shape the font styles of generated glyphs, with the image feature of scene text only representing its essential patterns. The generation process is directed by the spatial attention mechanism, which effectively copes with irregular texts and generates higher-quality glyphs than existing image-to-image translation methods. Experiments conducted on several STR benchmarks demonstrate the superiority of our method compared to the state of the art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题