论文标题
BOSS:在对象文化场景中人类信念预测的基准
BOSS: A Benchmark for Human Belief Prediction in Object-context Scenarios
论文作者
论文摘要
具有平均社会认知水平的人类可以仅基于非语言交流信号(例如,目光,手势,姿势和上下文信息)来推断他人的信念。这种预测人类信念和意图的社会认知能力对于确保安全的人类机器人互动和协作比以往任何时候都更为重要。本文使用心理理论(TOM)和对象文本关系的综合知识来研究禁止言语交流的环境中人与自治系统之间协作的方法。我们提出了一个新颖而挑战性的多模式视频数据集,用于评估人工智能(AI)系统在对象文化方案中预测人类信念状态的能力。所提出的数据集包括对人类信念的精确标记状态基地真实和多模式输入,这些输入复制了人类感知捕获的所有非语言交流输入。我们通过现有的深度学习模型进一步评估了数据集,并提供了有关各种输入方式和对象语言关系对基线模型性能的影响的新见解。
Humans with an average level of social cognition can infer the beliefs of others based solely on the nonverbal communication signals (e.g. gaze, gesture, pose and contextual information) exhibited during social interactions. This social cognitive ability to predict human beliefs and intentions is more important than ever for ensuring safe human-robot interaction and collaboration. This paper uses the combined knowledge of Theory of Mind (ToM) and Object-Context Relations to investigate methods for enhancing collaboration between humans and autonomous systems in environments where verbal communication is prohibited. We propose a novel and challenging multimodal video dataset for assessing the capability of artificial intelligence (AI) systems in predicting human belief states in an object-context scenario. The proposed dataset consists of precise labelling of human belief state ground-truth and multimodal inputs replicating all nonverbal communication inputs captured by human perception. We further evaluate our dataset with existing deep learning models and provide new insights into the effects of the various input modalities and object-context relations on the performance of the baseline models.