通过自我训练的单眼3D对象检测的无监督域的适应

论文标题

通过自我训练的单眼3D对象检测的无监督域的适应

Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training

论文作者

Li, Zhenyu, Chen, Zehui, Li, Ang, Fang, Liangji, Jiang, Qinhong, Liu, Xianming, Jiang, Junjun

论文摘要

通过深度学习技术和新兴的大规模自主驾驶数据集的出现，单眼3D对象检测（MONO3D）取得了前所未有的成功。然而，由于目标域缺乏标签，对于实用的跨域部署而言，剧烈的性能退化仍然是一个无与伦比的挑战。在本文中，我们首先全面研究了Mono3D域间隙的重要潜在因子，在Mono3D中，关键观察是由域的几何不对准引起的深度转移问题。然后，我们提出了STMONO3D，这是一个新的自我教学框架，用于Mono3D上的无监督域适应。为了减轻深度偏移，我们介绍了与几何分配的多尺度训练策略，以解开摄像机参数并保证域的几何一致性。基于此，我们开发了一个教师范式，以在目标域上产生自适应伪标签。受益于提供伪标签信息丰富信息的端到端框架，我们提出了质量感知的监督策略，以考虑实例级别的伪伪造的认知并提高目标域培训过程的有效性。此外，提出了积极的聚焦训练策略和动态阈值，以处理巨大的FN和FP伪样品。 STMONO3D在所有评估的数据集上实现了出色的性能，甚至超过了Kitti 3D对象检测数据集的完全监督结果。据我们所知，这是探索Mono3D有效的UDA方法的第一项研究。

Monocular 3D object detection (Mono3D) has achieved unprecedented success with the advent of deep learning techniques and emerging large-scale autonomous driving datasets. However, drastic performance degradation remains an unwell-studied challenge for practical cross-domain deployment as the lack of labels on the target domain. In this paper, we first comprehensively investigate the significant underlying factor of the domain gap in Mono3D, where the critical observation is a depth-shift issue caused by the geometric misalignment of domains. Then, we propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D. To mitigate the depth-shift, we introduce the geometry-aligned multi-scale training strategy to disentangle the camera parameters and guarantee the geometry consistency of domains. Based on this, we develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain. Benefiting from the end-to-end framework that provides richer information of the pseudo labels, we propose the quality-aware supervision strategy to take instance-level pseudo confidences into account and improve the effectiveness of the target-domain training process. Moreover, the positive focusing training strategy and dynamic threshold are proposed to handle tremendous FN and FP pseudo samples. STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset. To the best of our knowledge, this is the first study to explore effective UDA methods for Mono3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题