论文标题

用于推理和预测参与者行为的入门变压器

Entry-Flipped Transformer for Inference and Prediction of Participant Behavior

论文作者

Hu, Bo, Cham, Tat-Jen

论文摘要

一些小组活动,例如团队运动和编排的舞蹈,涉及参与者之间的互动。在这里,我们研究了在这种情况下,从运动路径和行动来推断和预测参与者行为的任务。我们将问题范围缩小到估计集合目标参与者对其他观察到的参与者的行为的反应。我们的关键思想是以一种在框架推理和预测期间对误差累积的方式建模参与者之间的时空关系。我们提出了一种新型的入门变压器(EF-Transformer),该变压器通过在空间和时间领域的注意机制来对参与者的关系进行建模。与典型的变压器不同,我们通过翻转查询,键和价值条目的顺序来解决错误积累的问题,以提高当前帧中观察到的特征的重要性和忠诚度。比较实验表明,我们的EF转换器在新收集的网球双打数据集,一个CEILIDH舞蹈数据集和两个行人数据集上取得了最佳性能。此外,还证明我们的EF转换器更好地限制了累积错误并从错误的估计中恢复。

Some group activities, such as team sports and choreographed dances, involve closely coupled interaction between participants. Here we investigate the tasks of inferring and predicting participant behavior, in terms of motion paths and actions, under such conditions. We narrow the problem to that of estimating how a set target participants react to the behavior of other observed participants. Our key idea is to model the spatio-temporal relations among participants in a manner that is robust to error accumulation during frame-wise inference and prediction. We propose a novel Entry-Flipped Transformer (EF-Transformer), which models the relations of participants by attention mechanisms on both spatial and temporal domains. Unlike typical transformers, we tackle the problem of error accumulation by flipping the order of query, key, and value entries, to increase the importance and fidelity of observed features in the current frame. Comparative experiments show that our EF-Transformer achieves the best performance on a newly-collected tennis doubles dataset, a Ceilidh dance dataset, and two pedestrian datasets. Furthermore, it is also demonstrated that our EF-Transformer is better at limiting accumulated errors and recovering from wrong estimations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源