用于轨迹预测的变压器网络

论文标题

用于轨迹预测的变压器网络

Transformer Networks for Trajectory Forecasting

论文作者

Giuliari, Francesco, Hasan, Irtiza, Cristani, Marco, Galasso, Fabio

论文摘要

在预测人们的动作方面取得的最新成功是基于LSTM模型，并且通过建模人们之间的社交互动以及人们与现场的互动来实现所有最新进展。我们质疑LSTM模型的使用，并提出了变压器网络在轨迹预测中的新颖使用。这是从LSTMS的顺序逐步处理到唯一基于注意力的变压器的记忆机制的基本切换。特别是，我们同时考虑了所有自然语言处理任务的原始变压器网络（TF）和较大的双向变压器（BERT）。我们提出的变形金刚预测了现场个人人的轨迹。这些是“简单”的模型，因为每个人都是单独建模的，而没有任何复杂的人类和场景相互作用项。特别是，没有铃铛和哨声的TF模型在Trajnet的最大，最具挑战性的轨迹预测基准上取得了最佳分数。此外，它的扩展可以预测多个合理的未来轨迹在ETH + UCY的5个数据集上具有更高工程的技术的表现。最后，我们表明变形金刚可能会处理缺失的观察结果，因为实际传感器数据可能是这种情况。代码可从https://github.com/fgiuliari/trajectory-transformer获得。

Most recent successes on forecasting the people motion are based on LSTM models and all most recent progress has been achieved by modelling the social interaction among people and the people interaction with the scene. We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. This is a fundamental switch from the sequential step-by-step processing of LSTMs to the only-attention-based memory mechanisms of Transformers. In particular, we consider both the original Transformer Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art on all natural language processing tasks. Our proposed Transformers predict the trajectories of the individual people in the scene. These are "simple" model because each person is modelled separately without any complex human-human nor scene interaction terms. In particular, the TF model without bells and whistles yields the best score on the largest and most challenging trajectory forecasting benchmark of TrajNet. Additionally, its extension which predicts multiple plausible future trajectories performs on par with more engineered techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers may deal with missing observations, as it may be the case with real sensor data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题