论文标题
人类和机器动作预测与对象信息无关
Human and Machine Action Prediction Independent of Object Information
论文作者
论文摘要
预测他人的行动是成功的社会互动的关键,使我们能够根据他人未来的行动来调整自己的行为。关于行动识别的研究集中在动作及其背景中涉及的对象的各个视觉特征的重要性上。但是,人类会认识到对未知对象的动作,甚至在想象的对象(哑剧)上。因此,其他提示必须补偿缺乏可识别的视觉对象特征。在这里,我们专注于行动过程中发生变化的对象间关系的作用。我们为50名受试者设计了一种虚拟现实设置和测试的识别速度。所有对象都是通过模拟立方体抽象的,因此无法使用对象信息推断该动作。取而代之的是,受试者仅依靠来自这些立方体之间空间关系的变化所带来的信息。尽管有这些约束,我们的结果表明,受试者能够平均预测动作持续时间的64%。我们采用了一个计算模型 - 一个丰富的语义事件链(ESEC) - 结合了空间关系的信息,特别是(a)对象的触摸/不触摸,(b)对象和(c)对象之间动态空间关系之间的静态空间关系。该模型接受了与受试者观察到的行动相同的行动,成功地预测了比人类更好的行动。信息理论分析表明,ESEC可以最佳地使用单个线索,而人类大概主要依赖于混合策略,该策略需要更长的时间才能识别。提供更好的行动认知基础,一方面可以提高我们对相关人类病理的理解,另一方面,也有助于为无冲突的人类机器人合作构建机器人。我们的结果在这里开放了新途径。
Predicting other people's action is key to successful social interactions, enabling us to adjust our own behavior to the consequence of the others' future actions. Studies on action recognition have focused on the importance of individual visual features of objects involved in an action and its context. Humans, however, recognize actions on unknown objects or even when objects are imagined (pantomime). Other cues must thus compensate the lack of recognizable visual object features. Here, we focus on the role of inter-object relations that change during an action. We designed a virtual reality setup and tested recognition speed for 10 different manipulation actions on 50 subjects. All objects were abstracted by emulated cubes so the actions could not be inferred using object information. Instead, subjects had to rely only on the information that comes from the changes in the spatial relations that occur between those cubes. In spite of these constraints, our results show the subjects were able to predict actions in, on average, less than 64% of the action's duration. We employed a computational model -an enriched Semantic Event Chain (eSEC)- incorporating the information of spatial relations, specifically (a) objects' touching/untouching, (b) static spatial relations between objects and (c) dynamic spatial relations between objects. Trained on the same actions as those observed by subjects, the model successfully predicted actions even better than humans. Information theoretical analysis shows that eSECs optimally use individual cues, whereas humans presumably mostly rely on a mixed-cue strategy, which takes longer until recognition. Providing a better cognitive basis of action recognition may, on one hand improve our understanding of related human pathologies and, on the other hand, also help to build robots for conflict-free human-robot cooperation. Our results open new avenues here.