DETR4D：直接多视图3D对象检测，注意力很少

论文标题

DETR4D：直接多视图3D对象检测，注意力很少

DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention

论文作者

Luo, Zhipeng, Zhou, Changqing, Zhang, Gongjie, Lu, Shijian

论文摘要

带有周围视图图像的3D对象检测是自动驾驶的必不可少的任务。在这项工作中，我们提出了DETR4D，这是一个基于变压器的框架，探讨了多视图图像中3D对象检测的稀疏关注和直接特征查询。我们为查询图像相互作用设计了一种新型的投影跨注意机制，以解决现有方法的局限性，以跨视图对象的几何提示剥削和信息损失。此外，我们引入了一种热图生成技术，该技术通过查询初始化有效地桥接了3D和2D空间。此外，与融合时间聚集的中间空间特征的常见实践不同，我们通过引入一种新型混合方法来提供新的视角，该方法在过去的对象查询和图像特征上执行跨框架融合，从而实现了有效的时间信息模型。 Nuscenes数据集的广泛实验证明了拟议的DETR4D的有效性和效率。

3D object detection with surround-view images is an essential task for autonomous driving. In this work, we propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images. We design a novel projective cross-attention mechanism for query-image interaction to address the limitations of existing methods in terms of geometric cue exploitation and information loss for cross-view objects. In addition, we introduce a heatmap generation technique that bridges 3D and 2D spaces efficiently via query initialization. Furthermore, unlike the common practice of fusing intermediate spatial features for temporal aggregation, we provide a new perspective by introducing a novel hybrid approach that performs cross-frame fusion over past object queries and image features, enabling efficient and robust modeling of temporal information. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and efficiency of the proposed DETR4D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题