深层自动编码器，具有顺序学习，用于多模式的维度情绪识别

论文标题

深层自动编码器，具有顺序学习，用于多模式的维度情绪识别

Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion Recognition

论文作者

Nguyen, Dung, Nguyen, Duc Thanh, Zeng, Rui, Nguyen, Thanh Thi, Tran, Son N., Nguyen, Thin, Sridharan, Sridha, Fookes, Clinton

论文摘要

多模式的情绪识别引起了情感计算社区的极大关注，并且已经对许多方案进行了广泛的研究，从而在这一领域取得了重大进展。但是，对于大多数现有方法，仍未得到一些问题，包括：（i）如何同时从多模式数据中学习紧凑而代表性的特征，（ii）如何从多模式流中有效捕获互补特征，以及（iii）如何以端到端方式执行所有任务。为了应对这些挑战，在本文中，我们提出了一种新型的深层神经网络体系结构，该结构由两流动自动编码器和长期记忆组成，以有效整合视觉和音频信号流以进行情感识别。为了验证我们提出的架构的鲁棒性，我们就野生数据集中的多模式情绪进行了广泛的实验：recola。实验结果表明，所提出的方法实现了最先进的识别性能，并超过了现有方案。

Multimodal dimensional emotion recognition has drawn a great attention from the affective computing community and numerous schemes have been extensively investigated, making a significant progress in this area. However, several questions still remain unanswered for most of existing approaches including: (i) how to simultaneously learn compact yet representative features from multimodal data, (ii) how to effectively capture complementary features from multimodal streams, and (iii) how to perform all the tasks in an end-to-end manner. To address these challenges, in this paper, we propose a novel deep neural network architecture consisting of a two-stream auto-encoder and a long short term memory for effectively integrating visual and audio signal streams for emotion recognition. To validate the robustness of our proposed architecture, we carry out extensive experiments on the multimodal emotion in the wild dataset: RECOLA. Experimental results show that the proposed method achieves state-of-the-art recognition performance and surpasses existing schemes by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题