多任务学习框架，用于野外识别

论文标题

多任务学习框架，用于野外识别

Multi-Task Learning Framework for Emotion Recognition in-the-wild

论文作者

Zhang, Tenggan, Liu, Chuanhe, Liu, Xiaolong, Liu, Yuchen, Meng, Liyu, Sun, Lei, Jiang, Wenqiang, Zhang, Fengyuan, Zhao, Jinming, Jin, Qin

论文摘要

本文介绍了我们在第四次情感行为分析（ABAW）竞争中进行多任务学习（MTL）挑战的系统。我们从三个方面探索了这一挑战的研究问题：1）为了获得有效且可靠的视觉特征表示，我们建议基于MAE的无监督表示学习以及基于IRESNET/DENSENET的基于IRESNET/DENSENET的监督表示方法； 2）考虑到视频中时间信息的重要性，我们探索了三种类型的顺序编码器，以捕获时间信息，包括基于变压器的编码器，基于LSTM的编码器，以及基于GRU的编码器； 3）为了建模这些不同任务（即价，唤醒，表达和AU）之间的相关性，以进行多任务情感分析，我们首先探讨了这些不同任务之间的依赖性，并提出了三个多任务学习框架以有效地模拟相关性。我们的系统在验证数据集上实现了$ 1.7607 $的性能和测试数据集的$ 1.4361 $，在MTL挑战中排名第一。该代码可从https://github.com/aim3-ruc/abaw4获得。

This paper presents our system for the Multi-Task Learning (MTL) Challenge in the 4th Affective Behavior Analysis in-the-wild (ABAW) competition. We explore the research problems of this challenge from three aspects: 1) For obtaining efficient and robust visual feature representations, we propose MAE-based unsupervised representation learning and IResNet/DenseNet-based supervised representation learning methods; 2) Considering the importance of temporal information in videos, we explore three types of sequential encoders to capture the temporal information, including the encoder based on transformer, the encoder based on LSTM, and the encoder based on GRU; 3) For modeling the correlation between these different tasks (i.e., valence, arousal, expression, and AU) for multi-task affective analysis, we first explore the dependency between these different tasks and propose three multi-task learning frameworks to model the correlations effectively. Our system achieves the performance of $1.7607$ on the validation dataset and $1.4361$ on the test dataset, ranking first in the MTL Challenge. The code is available at https://github.com/AIM3-RUC/ABAW4.

下载PDF全文

下载文献需遵守相关版权规定

论文标题