论文标题

折纸:弱监督,无细分,一步,完整的文本识别,通过学习展开

OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

论文作者

Yousef, Mohamed, Bishop, Tom E.

论文摘要

文本识别是一项重大的计算机视觉任务,具有一系列相关的挑战。传统挑战之一是文本识别和细分的耦合性质。在过去的几十年中,从基于细分的识别到无细分方法的方法逐渐解决了这个问题,事实证明,这对注释数据更为准确,更便宜。我们从无细分单线识别到无细分的多行 /完整页面识别迈出了一步。我们提出了一个新颖而简单的神经网络模块,称为\ textbf {折纸},可以增强任何经过CTC训练,完全卷积的单线线条文本识别器,通过为模型提供足够的空间能力,以便能够将其适当地倒入1D而不会丢失信息,从而将其转换为多行版本。可以使用完全相同的简单原始过程对这种修改的网络进行训练,并仅使用\ textbf {notevended}图像和文本对培训。我们进行了一套可解释性实验,这些实验表明我们的训练有素的模型学习了准确的隐式线细分。我们在IAM \&ICDAR 2017 HTR基准测试方面达到了最新的字符错误率,用于手写识别,超过了文献中的所有其他方法。在IAM上,我们甚至超过了在培训期间使用准确的本地化信息的单线方法。我们的代码可在\ url {https://github.com/intuitionmachines/origaminet}在线获得。

Text recognition is a major computer vision task with a big set of associated challenges. One of those traditional challenges is the coupled nature of text recognition and segmentation. This problem has been progressively solved over the past decades, going from segmentation based recognition to segmentation free approaches, which proved more accurate and much cheaper to annotate data for. We take a step from segmentation-free single line recognition towards segmentation-free multi-line / full page recognition. We propose a novel and simple neural network module, termed \textbf{OrigamiNet}, that can augment any CTC-trained, fully convolutional single line text recognizer, to convert it into a multi-line version by providing the model with enough spatial capacity to be able to properly collapse a 2D input signal into 1D without losing information. Such modified networks can be trained using exactly their same simple original procedure, and using only \textbf{unsegmented} image and text pairs. We carry out a set of interpretability experiments that show that our trained models learn an accurate implicit line segmentation. We achieve state-of-the-art character error rate on both IAM \& ICDAR 2017 HTR benchmarks for handwriting recognition, surpassing all other methods in the literature. On IAM we even surpass single line methods that use accurate localization information during training. Our code is available online at \url{https://github.com/IntuitionMachines/OrigamiNet}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源