论文标题
通过字符上下文解耦的开放式文本识别
Open-set Text Recognition via Character-Context Decoupling
论文作者
论文摘要
开放设定的文本识别任务是一个新兴的挑战,需要在评估过程中具有额外的能力来认识新颖角色。我们认为,当前方法的性能有限的主要原因是上下文信息对单个字符的视觉信息的混杂影响。在开放场景的情况下,上下文信息中的棘手偏见可以传递给视觉信息,从而损害了分类性能。在本文中,提出了一个字符 - 封词解耦框架,以通过分开上下文信息和角色 - 视觉信息来减轻此问题。上下文信息可以分解为时间信息和语言信息。在这里,用独立的时间注意模块隔离了字符顺序和单词长度的时间信息。建模N-Gram和其他语言统计的语言信息通过上下文锚固机制分开。各种定量和定性实验表明,我们的方法在开放式,零击和封闭式文本识别数据集上实现了有希望的性能。
The open-set text recognition task is an emerging challenge that requires an extra capability to cognize novel characters during evaluation. We argue that a major cause of the limited performance for current methods is the confounding effect of contextual information over the visual information of individual characters. Under open-set scenarios, the intractable bias in contextual information can be passed down to visual information, consequently impairing the classification performance. In this paper, a Character-Context Decoupling framework is proposed to alleviate this problem by separating contextual information and character-visual information. Contextual information can be decomposed into temporal information and linguistic information. Here, temporal information that models character order and word length is isolated with a detached temporal attention module. Linguistic information that models n-gram and other linguistic statistics is separated with a decoupled context anchor mechanism. A variety of quantitative and qualitative experiments show that our method achieves promising performance on open-set, zero-shot, and close-set text recognition datasets.