使用复发性神经网络对情绪维度注释的动态时间对齐

论文标题

使用复发性神经网络对情绪维度注释的动态时间对齐

Dynamic Time-Alignment of Dimensional Annotations of Emotion using Recurrent Neural Networks

论文作者

Alisamir, Sina, Ringeval, Fabien, Portet, Francois

论文摘要

大多数自动情绪识别系统利用情感的时间连续注释，以提供对自发表达的细粒度描述，如现实生活中所观察到的那样。由于情感是相当主观的，因此通常由几个注释者执行的注释，这些注释为给定维度提供痕迹，即描述诸如唤醒或价值之类的维度的时间连续系列。但是，同一表达式的注释在时间或价值之间很少一致，这增加了用于学习情感预测模型的迹线的偏见和延迟。因此，我们提出了一种可以通过使用复发性神经网络的相应的声学特征来动态补偿跨注释并将痕迹同步的方法。进行了几个情绪数据集进行实验评估，其中包括中文，法语，德语和匈牙利参与者，他们在无噪声条件或野外进行远程互动。结果表明，对于唤醒和价值，我们的方法可以显着增加通道间的一致性以及迹线和音频特征之间的相关性。此外，使用简单的轻量重量模型自动预测这些维度的改进，尤其是在无噪声条件下价值的情况下，以及用于捕获野外记录的唤醒。

Most automatic emotion recognition systems exploit time-continuous annotations of emotion to provide fine-grained descriptions of spontaneous expressions as observed in real-life interactions. As emotion is rather subjective, its annotation is usually performed by several annotators who provide a trace for a given dimension, i.e. a time-continuous series describing a dimension such as arousal or valence. However, annotations of the same expression are rarely consistent between annotators, either in time or in value, which adds bias and delay in the trace that is used to learn predictive models of emotion. We therefore propose a method that can dynamically compensate inconsistencies across annotations and synchronise the traces with the corresponding acoustic features using Recurrent Neural Networks. Experimental evaluations were carried on several emotion data sets that include Chinese, French, German, and Hungarian participants who interacted remotely in either noise-free conditions or in-the-wild. The results show that our method can significantly increase inter-annotator agreement, as well as correlation between traces and audio features, for both arousal and valence. In addition, improvements are obtained in the automatic prediction of these dimensions using simple light-weight models, especially for valence in noise-free conditions, and arousal for recordings captured in-the-wild.

下载PDF全文

下载文献需遵守相关版权规定

论文标题