论文标题

在实际场景中,文本识别带有一些标记的样本

Text Recognition in Real Scenarios with a Few Labeled Samples

论文作者

Lin, Jinghuang, Cheng, Zhanzhan, Bai, Fan, Niu, Yi, Pu, Shiliang, Zhou, Shuigeng

论文摘要

由于其各种应用,场景文本识别(STR)仍然是计算机视野领域的热门研究主题。现有作品主要集中于学习具有大量合成文本图像的通用模型,以识别不受约束的场景文本,并取得了长足的进步。但是,这些方法在许多现实世界中不太适用,其中1)需要高识别精度,而2)缺乏标记的样品。为了解决这个具有挑战性的问题,本文提出了一些射击的对抗序列域适应(FASDA)方法,以在合成源域(具有许多合成标记的样品)和一个特定的目标域(只有一些或几个实际标记的样品)之间构建序列适应性。这是通过通过注意机制同时学习每个角色的特征表示,并通过对抗性学习建立相应的角色级潜在子空间来完成。我们的方法可以最大程度地提高源域和目标域之间的字符级混淆,从而实现序列级适应性,即使是目标域中的少量标记样品也是如此。各种数据集上的广泛实验表明,我们的方法显着胜过鉴定方案,并获得了与最先进的STR方法可比的性能。

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each character's feature representation with an attention mechanism and establishing the corresponding character-level latent subspace with adversarial learning. Our approach can maximize the character-level confusion between the source domain and the target domain, thus achieves the sequence-level adaptation with even a small number of labeled samples in the target domain. Extensive experiments on various datasets show that our method significantly outperforms the finetuning scheme, and obtains comparable performance to the state-of-the-art STR methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源