论文标题

GPTR:用于图对象检测的Gestalt感知变压器

GPTR: Gestalt-Perception Transformer for Diagram Object Detection

论文作者

Hu, Xin, Zhang, Lingling, Liu, Jun, Fan, Jinfu, You, Yang, Wu, Yaqiang

论文摘要

图对象检测是实际应用的关键基础,例如教科书问题回答。由于该图主要由简单的线条和色块组成,因此其视觉特征比自然图像的图像更稀疏。此外,图通常表达多种知识,其中图中有许多低频对象类别。这些导致这样一个事实,即传统数据驱动的检测模型不适合图表。在这项工作中,我们为图对象检测提出了一个Gestalt感知变压器模型,该模型基于编码器解码器体系结构。格式塔感知包含一系列解释人类感知的法律,即人类视觉系统倾向于在图像中感知斑块,这些图像相似,紧密或连接而没有突然的定向变化,就像一个感知的整个对象一样。受这些想法的启发,我们在变压器编码器中构建了一个格式塔感知图,该图形由图形贴片组成,作为节点和斑块之间的关系。该图旨在通过在这些边缘中隐含的相似性,接近性和平滑度的定律将这些补丁分组为对象,以便可以有效地检测到有意义的对象。实验结果表明,所提出的GPTR在图对象检测任务中实现了最佳结果。我们的模型还获得了自然图像对象检测中竞争对手的可比结果。

Diagram object detection is the key basis of practical applications such as textbook question answering. Because the diagram mainly consists of simple lines and color blocks, its visual features are sparser than those of natural images. In addition, diagrams usually express diverse knowledge, in which there are many low-frequency object categories in diagrams. These lead to the fact that traditional data-driven detection model is not suitable for diagrams. In this work, we propose a gestalt-perception transformer model for diagram object detection, which is based on an encoder-decoder architecture. Gestalt perception contains a series of laws to explain human perception, that the human visual system tends to perceive patches in an image that are similar, close or connected without abrupt directional changes as a perceptual whole object. Inspired by these thoughts, we build a gestalt-perception graph in transformer encoder, which is composed of diagram patches as nodes and the relationships between patches as edges. This graph aims to group these patches into objects via laws of similarity, proximity, and smoothness implied in these edges, so that the meaningful objects can be effectively detected. The experimental results demonstrate that the proposed GPTR achieves the best results in the diagram object detection task. Our model also obtains comparable results over the competitors in natural image object detection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源