MVP：驾驶动作本地化的强大多视图实践

论文标题

MVP：驾驶动作本地化的强大多视图实践

MVP: Robust Multi-View Practice for Driving Action Localization

论文作者

Shang, Jingjie, Li, Kunchang, Tian, Kaibin, Su, Haisheng, Li, Yangguang

论文摘要

分散注意力的驾驶每年会导致数千人死亡，以及如何应用深度学习的方法来防止这些悲剧已成为一个关键问题。在第六届AI城市挑战赛的Track3中，研究人员提供了一个具有密集动作注释的高质量视频数据集。由于数据量表和不清楚的动作边界，数据集提出了一个独特的挑战，即精确本地化所有不同的动作并对其类别进行分类。在本文中，我们充分利用了视频之间的多视图同步，并进行了强大的多视图实践（MVP）以驱动动作本地化。为了避免过度拟合，我们将Kinetics-700预训练作为特征提取器微调。然后，不同视图的特征将传递给ActionFormer，以生成候选行动建议。为了精确定位所有动作，我们设计了精心设计的后处理，包括模型投票，阈值过滤和删除重复。结果表明，我们的MVP对于驱动动作定位是可靠的，在TRACK3测试集中达到28.49％的F1分数。

Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the different actions and classify their categories. In this paper, we make good use of the multi-view synchronization among videos, and conduct robust Multi-View Practice (MVP) for driving action localization. To avoid overfitting, we fine-tune SlowFast with Kinetics-700 pre-training as the feature extractor. Then the features of different views are passed to ActionFormer to generate candidate action proposals. For precisely localizing all the actions, we design elaborate post-processing, including model voting, threshold filtering and duplication removal. The results show that our MVP is robust for driving action localization, which achieves 28.49% F1-score in the Track3 test set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题