通过语义推理网络进行准确的场景文本识别

论文标题

通过语义推理网络进行准确的场景文本识别

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

论文作者

Yu, Deli, Li, Xuan, Zhang, Chengquan, Han, Junyu, Liu, Jingtuo, Ding, Errui

论文摘要

场景文本图像包含两个内容：视觉纹理和语义信息。尽管以前的场景文本识别方法在过去几年中取得了长足的进步，但有关挖掘语义信息以帮助文本识别的研究吸引了较少的关注，仅探索了类似RNN的结构来隐式模型的语义信息。但是，我们观察到基于RNN的方法具有一些明显的缺点，例如时间依赖的解码方式和语义上下文的单向串行传输，这极大地限制了语义信息的帮助和计算效率。为了减轻这些局限性，我们提出了一个新颖的端到端可训练框架，名为语义推理网络（SRN），以进行准确的场景文本识别，其中引入了全局语义推理模块（GSRM），以通过多路并行传输来捕获全球语义上下文。最先进的结果是7个公共基准，包括常规文本，不规则文本和非拉丁蛋白长文本，验证了所提出方法的有效性和鲁棒性。此外，SRN的速度比基于RNN的方法具有显着优势，证明其在实际使用中的价值。

Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. The state-of-the-art results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method. In addition, the speed of SRN has significant advantages over the RNN based methods, demonstrating its value in practical use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题