使用深3D特征和序列建模的多模式连续价和唤醒预测

论文标题

使用深3D特征和序列建模的多模式连续价和唤醒预测

Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling

论文作者

Rasipuram, Sowmya, Bhat, Junaid Hamid, Maitra, Anutosh

论文摘要

野外的连续影响预测是一个非常有趣的问题，由于连续预测涉及大量计算，因此具有挑战性。本文介绍了我们为预测连续情绪维度的贡献中使用的方法和技术，即在AFF-WILD2数据库上的ABAW竞争中的价和唤醒。 AFF-WILD2数据库由在框架级别标记为价和唤醒的野生视频组成。我们提出的方法使用了使用最新方法提取的音频和视频功能（多模式）的融合。这些音频视频特征用于训练基于门控复发单元（GRU）的序列到序列模型。我们通过简单体系结构显示了验证数据的有希望的结果。所提出的方法的总体价和唤醒分别为0.22和0.34，比竞争基线分别高0.14和0.24。

Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题