论文标题
3D面部形状是否足够表达足以识别连续的情绪和动作单位强度?
Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?
论文作者
论文摘要
认识到面部视频的连续情绪和动作单元(AU)强度需要对表达动态的空间和时间理解。现有作品主要依靠2D面目来提取这种动态。这项工作着重于基于参数3D面向形状模型的有希望的替代方案,该模型解散了不同的变化因子,包括表达诱导的形状变化。我们旨在了解与最先进的2D外观模型相比,在估计价呈估计和AU强度方面的表现力3D面形形状如何。我们基准了四个最近的3D面对准模型:Expnet,3DDFA-V2,DECA和EMOCA。在价值估计中,3D面模型的表达特征始终超过以前的作品,并在SEWA和AVEC 2019 CES CORPORA上的平均一致性相关性分别为.739和.574。我们还研究了对BP4D和DISFA数据集的AU强度估计的3D面形状如何执行的,并报告说3D面特征在AUS 4、6、10、12和25中的2D外观特征是相当的,但不是整个AUS。为了理解这种差异,我们在价值和AUS之间进行了对应分析,该分析指出,准确的价值预测可能仅需要几个AUS的知识。
Recognising continuous emotions and action unit (AU) intensities from face videos requires a spatial and temporal understanding of expression dynamics. Existing works primarily rely on 2D face appearances to extract such dynamics. This work focuses on a promising alternative based on parametric 3D face shape alignment models, which disentangle different factors of variation, including expression-induced shape variations. We aim to understand how expressive 3D face shapes are in estimating valence-arousal and AU intensities compared to the state-of-the-art 2D appearance-based models. We benchmark four recent 3D face alignment models: ExpNet, 3DDFA-V2, DECA, and EMOCA. In valence-arousal estimation, expression features of 3D face models consistently surpassed previous works and yielded an average concordance correlation of .739 and .574 on SEWA and AVEC 2019 CES corpora, respectively. We also study how 3D face shapes performed on AU intensity estimation on BP4D and DISFA datasets, and report that 3D face features were on par with 2D appearance features in AUs 4, 6, 10, 12, and 25, but not the entire set of AUs. To understand this discrepancy, we conduct a correspondence analysis between valence-arousal and AUs, which points out that accurate prediction of valence-arousal may require the knowledge of only a few AUs.