论文标题
要查看的内容和何处:用于检测人类对象相互作用的语义和空间精制变压器
What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions
论文作者
论文摘要
我们提出了一种新型的单阶段变压器的语义和空间精制变压器(SSRT),以解决人类对象的相互作用检测任务,该任务需要定位人类和对象,并预测其相互作用。 SSRT与以前基于变压器的HOI方法不同,该方法主要集中于改善解码器输出的设计,SSRT引入了两个新模块,以帮助选择图像中最相关的对象效果对,并使用丰富的语义和空间特征来完善查询的表示。这些增强功能导致了两个最受欢迎的HOI基准的最新结果:V-Coco和Hico-Det。
We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from previous Transformer-based HOI approaches, which mostly focus at improving the design of the decoder outputs for the final detection, SSRT introduces two new modules to help select the most relevant object-action pairs within an image and refine the queries' representation using rich semantic and spatial features. These enhancements lead to state-of-the-art results on the two most popular HOI benchmarks: V-COCO and HICO-DET.