SS3D：单拍3D对象检测器

论文标题

SS3D：单拍3D对象检测器

SS3D: Single Shot 3D Object Detector

论文作者

Limaye, Aniket, Mathew, Manu, Nagori, Soyeb, Swami, Pramod Kumar, Maji, Debapriya, Desappan, Kumar

论文摘要

单阶段的深度学习算法用于2D对象检测是由单镜头多伯克斯检测器（SSD）流行的，并且在几种嵌入式应用中被大量采用。 PointPillars是最先进的3D对象检测算法的状态，该算法使用适用于3D对象检测的单个射击检测器。 Pointpillars的主要缺点是，它具有基于完全连接的层的学习输入表示，然后是单个Shot检测器进行3D检测。在本文中，我们介绍了单次射击3D对象检测（SS3D） - 单个阶段3D对象检测算法，该算法将直截了当，统计计算的输入表示和单个Shot检测器（基于PointPillars）结合在一起。计算输入表示形式是直截了当的，不涉及学习，并且没有太多的计算成本。我们还将我们的方法扩展到立体声输入并表明，在其他语义分割输入的帮助下；我们的方法产生的准确性与基于最新立体声的检测器的状态相似。使用单阶段方法实现两个阶段探测器的准确性很重要，因为单阶段方法更容易在嵌入式的实时应用中实现。使用LiDAR和立体声输入，我们的方法的表现优于指尖。当使用LiDAR输入时，我们的输入表示形式能够将中等类别中汽车对象的AP3D从74.99到76.84提高。使用立体声输入时，我们的输入表示能力可以将中等类别中的汽车对象的AP3D从38.13提高到45.13。我们的结果也比其他流行的3D对象检测器（例如AVOD和F-Pointnet）更好。

Single stage deep learning algorithm for 2D object detection was made popular by Single Shot MultiBox Detector (SSD) and it was heavily adopted in several embedded applications. PointPillars is a state of the art 3D object detection algorithm that uses a Single Shot Detector adapted for 3D object detection. The main downside of PointPillars is that it has a two stage approach with learned input representation based on fully connected layers followed by the Single Shot Detector for 3D detection. In this paper we present Single Shot 3D Object Detection (SS3D) - a single stage 3D object detection algorithm which combines straight forward, statistically computed input representation and a Single Shot Detector (based on PointPillars). Computing the input representation is straight forward, does not involve learning and does not have much computational cost. We also extend our method to stereo input and show that, aided by additional semantic segmentation input; our method produces similar accuracy as state of the art stereo based detectors. Achieving the accuracy of two stage detectors using a single stage approach is important as single stage approaches are simpler to implement in embedded, real-time applications. With LiDAR as well as stereo input, our method outperforms PointPillars. When using LiDAR input, our input representation is able to improve the AP3D of Cars objects in the moderate category from 74.99 to 76.84. When using stereo input, our input representation is able to improve the AP3D of Cars objects in the moderate category from 38.13 to 45.13. Our results are also better than other popular 3D object detectors such as AVOD and F-PointNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题