论文标题

MVP:驾驶动作本地化的强大多视图实践

MVP: Robust Multi-View Practice for Driving Action Localization

论文作者

Shang, Jingjie, Li, Kunchang, Tian, Kaibin, Su, Haisheng, Li, Yangguang

论文摘要

分散注意力的驾驶每年会导致数千人死亡,以及如何应用深度学习的方法来防止这些悲剧已成为一个关键问题。在第六届AI城市挑战赛的Track3中,研究人员提供了一个具有密集动作注释的高质量视频数据集。由于数据量表和不清楚的动作边界,数据集提出了一个独特的挑战,即精确本地化所有不同的动作并对其类别进行分类。在本文中,我们充分利用了视频之间的多视图同步,并进行了强大的多视图实践(MVP)以驱动动作本地化。为了避免过度拟合,我们将Kinetics-700预训练作为特征提取器微调。然后,不同视图的特征将传递给ActionFormer,以生成候选行动建议。为了精确定位所有动作,我们设计了精心设计的后处理,包括模型投票,阈值过滤和删除重复。结果表明,我们的MVP对于驱动动作定位是可靠的,在TRACK3测试集中达到28.49%的F1分数。

Distracted driving causes thousands of deaths per year, and how to apply deep-learning methods to prevent these tragedies has become a crucial problem. In Track3 of the 6th AI City Challenge, researchers provide a high-quality video dataset with densely action annotations. Due to the small data scale and unclear action boundary, the dataset presents a unique challenge to precisely localize all the different actions and classify their categories. In this paper, we make good use of the multi-view synchronization among videos, and conduct robust Multi-View Practice (MVP) for driving action localization. To avoid overfitting, we fine-tune SlowFast with Kinetics-700 pre-training as the feature extractor. Then the features of different views are passed to ActionFormer to generate candidate action proposals. For precisely localizing all the actions, we design elaborate post-processing, including model voting, threshold filtering and duplication removal. The results show that our MVP is robust for driving action localization, which achieves 28.49% F1-score in the Track3 test set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源