论文标题
docentr:端到端文档图像增强变压器
DocEnTr: An End-to-End Document Image Enhancement Transformer
论文作者
论文摘要
文档图像可能会受到许多退化方案的影响,这会导致识别和处理困难。在这个数字化时代,必须将它们授予适当的使用非常重要。为了应对这一挑战,我们基于视觉变压器提出了一个新的编码器架构,以端到端的方式增强机器打印和手写的文档图像。编码器直接在像素贴片上及其位置信息直接运行,而无需使用任何卷积层,而解码器则从编码的补丁中重建了干净的图像。进行的实验表明,与几种DIBCO基准的最新方法相比,所提出的模型的优越性。代码和模型将在:\ url {https://github.com/dali92002/docentr}上公开获得。
Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: \url{https://github.com/dali92002/DocEnTR}.