Cubifae-3D：基于自动编码器的3D对象检测的单眼相机空间群

论文标题

Cubifae-3D：基于自动编码器的3D对象检测的单眼相机空间群

CubifAE-3D: Monocular Camera Space Cubification for Auto-Encoder based 3D Object Detection

论文作者

Shrivastava, Shubham, Chakravarty, Punarjay

论文摘要

我们介绍了一种使用单眼图像进行3D对象检测的方法。从合成数据集开始，我们预先培训RGB至深度自动编码器（AE）。然后，从此AE中学到的嵌入被用于训练3D对象检测器（3DOD）CNN，该cnn用于回归3D对象的参数后，从AE编码器中，AE从AE中产生了从RGB图像中生成潜在嵌入。我们表明，我们可以使用成对的RGB和深度图像从模拟数据中预先培训AE，然后仅使用真实数据训练3DOD网络，包括RGB图像和3D对象姿势标签（无需密度深度）。我们的3DOD网络利用了相机周围3D空间的特定“ Cubifition”，每个Cuboid的任务是预测N对象姿势以及它们的类和置信度值。 AE预训练和这种将摄像头周围3D空间分为长方体的方法使我们的方法名称-Cubifae-3d。我们证明了与虚拟Kitti 2和Kitti数据集中自动驾驶汽车（AV）用例中单眼3D对象检测的结果。

We introduce a method for 3D object detection using a single monocular image. Starting from a synthetic dataset, we pre-train an RGB-to-Depth Auto-Encoder (AE). The embedding learnt from this AE is then used to train a 3D Object Detector (3DOD) CNN which is used to regress the parameters of 3D object poses after the encoder from the AE generates a latent embedding from the RGB image. We show that we can pre-train the AE using paired RGB and depth images from simulation data once and subsequently only train the 3DOD network using real data, comprising of RGB images and 3D object pose labels (without the requirement of dense depth). Our 3DOD network utilizes a particular `cubification' of 3D space around the camera, where each cuboid is tasked with predicting N object poses, along with their class and confidence values. The AE pre-training and this method of dividing the 3D space around the camera into cuboids give our method its name - CubifAE-3D. We demonstrate results for monocular 3D object detection in the Autonomous Vehicle (AV) use-case with the Virtual KITTI 2 and the KITTI datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题