在自动多模式欺骗检测中引入面部影响的表示

论文标题

在自动多模式欺骗检测中引入面部影响的表示

Introducing Representations of Facial Affect in Automated Multimodal Deception Detection

论文作者

Mathur, Leena, Matarić, Maja J

论文摘要

自动欺骗检测系统可以通过帮助人类在医疗和法律领域的高风险情况下探测欺骗者，从而增强社会的健康，正义和安全。本文介绍了对自动欺骗检测的面部影响维度表示的判别能力的新分析，以及视觉，声音和言语方式的可解释特征。我们使用了一个视频数据集，这些数据集是在现实世界中的高风险法庭情况下进行真实或欺骗性的。我们通过在Aff-Wild数据库中实施最新的深层神经网络来利用自动情绪识别的最新进展，以提取扬声器的面部价和面部唤醒的连续表示。我们尝试了单峰支持向量机（SVM）和基于SVM的多模式融合方法，以识别用于检测欺骗的有效特征，模态和建模方法。对面部影响训练的单峰模型达到了80％的AUC，面部影响有助于表现最高的多模式方法（自适应增强），当对不属于培训集的扬声器进行测试时，AUC的AUC达到了91％。这种方法比现有的自动化机器学习方法实现了更高的AUC，该方法使用可解释的视觉，人声和口头特征来检测该数据集中的欺骗，但没有使用面部影响。在所有视频中，欺骗性和真实的演讲者在面部价和面部唤醒方面表现出显着差异，从而为现有的情感和欺骗提供了计算支持。在我们的模型中，面部情感的重要性表明了为自动化，感知感知的机器学习方法的未来发展，用于建模和检测欺骗和其他社会行为。

Automated deception detection systems can enhance health, justice, and security in society by helping humans detect deceivers in high-stakes situations across medical and legal domains, among others. This paper presents a novel analysis of the discriminative power of dimensional representations of facial affect for automated deception detection, along with interpretable features from visual, vocal, and verbal modalities. We used a video dataset of people communicating truthfully or deceptively in real-world, high-stakes courtroom situations. We leveraged recent advances in automated emotion recognition in-the-wild by implementing a state-of-the-art deep neural network trained on the Aff-Wild database to extract continuous representations of facial valence and facial arousal from speakers. We experimented with unimodal Support Vector Machines (SVM) and SVM-based multimodal fusion methods to identify effective features, modalities, and modeling approaches for detecting deception. Unimodal models trained on facial affect achieved an AUC of 80%, and facial affect contributed towards the highest-performing multimodal approach (adaptive boosting) that achieved an AUC of 91% when tested on speakers who were not part of training sets. This approach achieved a higher AUC than existing automated machine learning approaches that used interpretable visual, vocal, and verbal features to detect deception in this dataset, but did not use facial affect. Across all videos, deceptive and truthful speakers exhibited significant differences in facial valence and facial arousal, contributing computational support to existing psychological theories on affect and deception. The demonstrated importance of facial affect in our models informs and motivates the future development of automated, affect-aware machine learning approaches for modeling and detecting deception and other social behaviors in-the-wild.

下载PDF全文

下载文献需遵守相关版权规定

论文标题