您需要的只是第二个看：朝更紧密的任意形状检测

论文标题

您需要的只是第二个看：朝更紧密的任意形状检测

All you need is a second look: Towards Tighter Arbitrary shape text detection

论文作者

Cao, Meng, Zou, Yuexian

论文摘要

在过去的几年中，基于深度学习的场景文本检测方法已经取得了长足的进步。但是，还有几个问题要解决。通常，由于CNN的接受场大小有限，长曲线文本实例往往会分散。此外，当处理更具挑战性的任意形状文本时，使用矩形或四边形边界框的简单表示不足。此外，文本实例的规模差异很大，这导致了通过单个分割网络进行准确预测的困难。为了解决这些问题，我们创新提出了一个基于两阶段分割的任意文本检测器，名为\ textit {nask}（\ textbf {n} eed \ textbf {a} \ textbf {s} s} econd loo \ textbf {k}）。具体而言，\ textIt {nask}由文本实例分割网络组成，即\ textIt {tis}（\（1^{st} \）阶段），一个文本ROI池池模块和一个被称为\ textit {fortit {fox}（fox}（fox}（fox}（\（2^nd nd），首先，\ textIt {tis}进行实例分割，以获取具有拟议组空间和通道注意模块（\ textit {gsca}）的矩形文本建议，以增强特征表达式。然后，文本ROI池将这些矩形转换为固定尺寸。最后，\ textit {fox}使用预测的几何属性，包括文本中心线，文本线方向，字符量表和字符方向，以更严格的表示来重建文本实例。对两个公共基准测试的实验结果，包括\ textit {total-text}和\ textit {scut-ctw1500}，已经证明了所提出的\ textit {nask}实现了最新的结果。

Deep learning-based scene text detection methods have progressed substantially over the past years. However, there remain several problems to be solved. Generally, long curve text instances tend to be fragmented because of the limited receptive field size of CNN. Besides, simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts. In addition, the scale of text instances varies greatly which leads to the difficulty of accurate prediction through a single segmentation network. To address these problems, we innovatively propose a two-stage segmentation based arbitrary text detector named \textit{NASK} (\textbf{N}eed \textbf{A} \textbf{S}econd loo\textbf{K}). Specifically, \textit{NASK} consists of a Text Instance Segmentation network namely \textit{TIS} (\(1^{st}\) stage), a Text RoI Pooling module and a Fiducial pOint eXpression module termed as \textit{FOX} (\(2^{nd}\) stage). Firstly, \textit{TIS} conducts instance segmentation to obtain rectangle text proposals with a proposed Group Spatial and Channel Attention module (\textit{GSCA}) to augment the feature expression. Then, Text RoI Pooling transforms these rectangles to the fixed size. Finally, \textit{FOX} is introduced to reconstruct text instances with a more tighter representation using the predicted geometrical attributes including text center line, text line orientation, character scale and character orientation. Experimental results on two public benchmarks including \textit{Total-Text} and \textit{SCUT-CTW1500} have demonstrated that the proposed \textit{NASK} achieves state-of-the-art results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题