自动驾驶的单眼实例运动细分：Kitti InstanceMotseg数据集和多任务基线

论文标题

自动驾驶的单眼实例运动细分：Kitti InstanceMotseg数据集和多任务基线

Monocular Instance Motion Segmentation for Autonomous Driving: KITTI InstanceMotSeg Dataset and Multi-task Baseline

论文作者

Mohamed, Eslam, Ewaisha, Mahmoud, Siam, Mennatullah, Rashed, Hazem, Yogamani, Senthil, Hamdy, Waleed, Helmi, Muhammad, El-Sallab, Ahmad

论文摘要

移动对象分割是自动驾驶汽车的至关重要任务，因为它可以根据其运动提示以类不可知的方式分割对象。它可以根据其运动和独立于外观的训练（例如驼鹿或施工卡车）在训练期间（例如驼鹿或建筑卡车）进行检测。尽管在自主驾驶文献中已经研究了像素运动细分，但很少在实例级别解决，这将有助于分离移动对象的连接段，从而实现更好的轨迹计划。由于主要问题是缺乏大型公共数据集，因此我们创建了一个新的InstanceMotSeg数据集，其中包括12.9k样品在我们的Kittimoseg数据集上进行改进。除了提供实例级别的注释外，我们还增加了4个其他类，这对于研究不可知论运动分割至关重要。我们调整YOLACT并实现基于运动的不可知实例分割模型，该模型将充当数据集的基准。我们还将其扩展到有效的多任务模型，该模型还提供了共享编码器的语义实例分割。然后，该模型在类不可知论和语义头中学习单独的原型系数，提供了两个独立的对象检测路径，以实现冗余安全性。为了获得实时性能，我们使用MobilenEtV2研究了不同的有效编码器，并在Titan XP GPU上获得39 fps，相对于基线，其MAP提高了10％。我们的模型将先前的TAR运动分割方法提高了3.3％。数据集和定性结果视频在我们的网站https://sites.google.com/view/instancemotseg/上共享。

Moving object segmentation is a crucial task for autonomous vehicles as it can be used to segment objects in a class agnostic manner based on their motion cues. It enables the detection of unseen objects during training (e.g., moose or a construction truck) based on their motion and independent of their appearance. Although pixel-wise motion segmentation has been studied in autonomous driving literature, it has been rarely addressed at the instance level, which would help separate connected segments of moving objects leading to better trajectory planning. As the main issue is the lack of large public datasets, we create a new InstanceMotSeg dataset comprising of 12.9K samples improving upon our KITTIMoSeg dataset. In addition to providing instance level annotations, we have added 4 additional classes which is crucial for studying class agnostic motion segmentation. We adapt YOLACT and implement a motion-based class agnostic instance segmentation model which would act as a baseline for the dataset. We also extend it to an efficient multi-task model which additionally provides semantic instance segmentation sharing the encoder. The model then learns separate prototype coefficients within the class agnostic and semantic heads providing two independent paths of object detection for redundant safety. To obtain real-time performance, we study different efficient encoders and obtain 39 fps on a Titan Xp GPU using MobileNetV2 with an improvement of 10% mAP relative to the baseline. Our model improves the previous state of the art motion segmentation method by 3.3%. The dataset and qualitative results video are shared in our website at https://sites.google.com/view/instancemotseg/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题