要查看的内容和何处：用于检测人类对象相互作用的语义和空间精制变压器

论文标题

要查看的内容和何处：用于检测人类对象相互作用的语义和空间精制变压器

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions

论文作者

Iftekhar, A S M, Chen, Hao, Kundu, Kaustav, Li, Xinyu, Tighe, Joseph, Modolo, Davide

论文摘要

我们提出了一种新型的单阶段变压器的语义和空间精制变压器（SSRT），以解决人类对象的相互作用检测任务，该任务需要定位人类和对象，并预测其相互作用。 SSRT与以前基于变压器的HOI方法不同，该方法主要集中于改善解码器输出的设计，SSRT引入了两个新模块，以帮助选择图像中最相关的对象效果对，并使用丰富的语义和空间特征来完善查询的表示。这些增强功能导致了两个最受欢迎的HOI基准的最新结果：V-Coco和Hico-Det。

We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from previous Transformer-based HOI approaches, which mostly focus at improving the design of the decoder outputs for the final detection, SSRT introduces two new modules to help select the most relevant object-action pairs within an image and refine the queries' representation using rich semantic and spatial features. These enhancements lead to state-of-the-art results on the two most popular HOI benchmarks: V-COCO and HICO-DET.

下载PDF全文

下载文献需遵守相关版权规定

论文标题