在实际场景中，文本识别带有一些标记的样本

论文标题

在实际场景中，文本识别带有一些标记的样本

Text Recognition in Real Scenarios with a Few Labeled Samples

论文作者

Lin, Jinghuang, Cheng, Zhanzhan, Bai, Fan, Niu, Yi, Pu, Shiliang, Zhou, Shuigeng

论文摘要

由于其各种应用，场景文本识别（STR）仍然是计算机视野领域的热门研究主题。现有作品主要集中于学习具有大量合成文本图像的通用模型，以识别不受约束的场景文本，并取得了长足的进步。但是，这些方法在许多现实世界中不太适用，其中1）需要高识别精度，而2）缺乏标记的样品。为了解决这个具有挑战性的问题，本文提出了一些射击的对抗序列域适应（FASDA）方法，以在合成源域（具有许多合成标记的样品）和一个特定的目标域（只有一些或几个实际标记的样品）之间构建序列适应性。这是通过通过注意机制同时学习每个角色的特征表示，并通过对抗性学习建立相应的角色级潜在子空间来完成。我们的方法可以最大程度地提高源域和目标域之间的字符级混淆，从而实现序列级适应性，即使是目标域中的少量标记样品也是如此。各种数据集上的广泛实验表明，我们的方法显着胜过鉴定方案，并获得了与最先进的STR方法可比的性能。

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each character's feature representation with an attention mechanism and establishing the corresponding character-level latent subspace with adversarial learning. Our approach can maximize the character-level confusion between the source domain and the target domain, thus achieves the sequence-level adaptation with even a small number of labeled samples in the target domain. Extensive experiments on various datasets show that our method significantly outperforms the finetuning scheme, and obtains comparable performance to the state-of-the-art STR methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题