ParkPredict+：与CNN和Transformer停车场中车辆的多模式意图和运动预测

论文标题

ParkPredict+：与CNN和Transformer停车场中车辆的多模式意图和运动预测

ParkPredict+: Multimodal Intent and Motion Prediction for Vehicles in Parking Lots with CNN and Transformer

论文作者

Shen, Xu, Lacayo, Matthew, Guggilla, Nidhir, Borrelli, Francesco

论文摘要

本文解决了停车场中人为驱动的车辆的多模式意图和轨迹预测的问题。使用使用CNN和Transformer网络设计的模型，我们从轨迹历史记录和本地鸟类视图（BEV）语义图像中提取时间空间和上下文信息，并产生有关意图分布和未来轨迹序列的预测。我们的方法的精度优于现有模型，同时允许任意数量的模式，编码复杂的多代理方案，并适应不同的停车位。为了培训和评估我们的方法，我们介绍了第一个公共4K视频数据集，该数据集是在停车场中以准确的注释，高框架速率和丰富的交通情况的驾驶。

The problem of multimodal intent and trajectory prediction for human-driven vehicles in parking lots is addressed in this paper. Using models designed with CNN and Transformer networks, we extract temporal-spatial and contextual information from trajectory history and local bird's eye view (BEV) semantic images, and generate predictions about intent distribution and future trajectory sequences. Our methods outperform existing models in accuracy, while allowing an arbitrary number of modes, encoding complex multi-agent scenarios, and adapting to different parking maps. To train and evaluate our method, we present the first public 4K video dataset of human driving in parking lots with accurate annotation, high frame rate, and rich traffic scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题