与可控图像合成的自我监督文本擦除

论文标题

与可控图像合成的自我监督文本擦除

Self-Supervised Text Erasing with Controllable Image Synthesis

论文作者

Jiang, Gangwei, Wang, Shiyao, Ge, Tiezheng, Jiang, Yuning, Wei, Ying, Lian, Defu

论文摘要

在场景文本擦除方面的最新努力显示出令人鼓舞的结果。但是，现有方法需要丰富而昂贵的标签注释才能获得强大的模型，这限制了用于实际应用的使用。为此，我们通过提出一种新颖的自学文本擦除（Ste）框架来研究无监督的方案，该框架共同学会了用擦除地面真实性合成训练图像并在现实世界中准确擦除文本。我们首先设计了一种样式感知的图像合成功能，以基于两种合成机制生成具有多种样式文本的合成图像。为了弥合综合数据和现实世界数据之间的文本样式差距，构建了策略网络，以通过选择两个专门设计的奖励的指导来控制综合机制。然后，用擦除地面真相的合成训练图像被喂食以训练一个粗到精细的擦除网络。为了产生更好的擦除输出，三胞胎擦除损失旨在强制执行改进阶段以恢复背景纹理。此外，我们提供了一个新的数据集（称为postererase），该数据集包含60k高分辨率海报，带有文本，对于文本擦除任务更具挑战性。该方法已通过后期酶和广泛使用的SCUT-ENSTEXT数据集进行了广泛的评估。值得注意的是，在后续酶方面，我们的无监督方法以FID为5.07，比现有监督的基线相对的相对性能为20.9％。

Recent efforts on scene text erasing have shown promising results. However, existing methods require rich yet costly label annotations to obtain robust models, which limits the use for practical applications. To this end, we study an unsupervised scenario by proposing a novel Self-supervised Text Erasing (STE) framework that jointly learns to synthesize training images with erasure ground-truth and accurately erase texts in the real world. We first design a style-aware image synthesis function to generate synthetic images with diverse styled texts based on two synthetic mechanisms. To bridge the text style gap between the synthetic and real-world data, a policy network is constructed to control the synthetic mechanisms by picking style parameters with the guidance of two specifically designed rewards. The synthetic training images with erasure ground-truth are then fed to train a coarse-to-fine erasing network. To produce better erasing outputs, a triplet erasure loss is designed to enforce the refinement stage to recover background textures. Moreover, we provide a new dataset (called PosterErase), which contains 60K high-resolution posters with texts and is more challenging for the text erasing task. The proposed method has been extensively evaluated with both PosterErase and the widely-used SCUT-Enstext dataset. Notably, on PosterErase, our unsupervised method achieves 5.07 in terms of FID, with a relative performance of 20.9% over existing supervised baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题