使用图形卷积网络基于视频的面部表达识别

论文标题

使用图形卷积网络基于视频的面部表达识别

Video-based Facial Expression Recognition using Graph Convolutional Networks

论文作者

Liu, Daizong, Zhang, Hongting, Zhou, Pan

论文摘要

面部表达识别（FER）旨在对面部图像或视频中存在的表达进行分类，在人工智能和多媒体领域吸引了许多研究兴趣。就基于视频的FER任务而言，捕获框架之间的动态表达变化以识别面部表达是明智的。但是，现有方法直接利用CNN-RNN或3D CNN从不同面部单元中提取时空特征，而不是在表达变化捕获过程中专注于特定区域，从而导致FER的性能有限。在我们的论文中，我们将图形卷积网络（GCN）层引入基于CNN-RNN的常见模型，用于基于视频的FER。首先，GCN层用于学习更重要的面部表达特征，这些特征将集中在某些区域之上，在提取的节点的CNN特征之间共享信息。然后，将LSTM层应用于学习GCN学习特征以建模变化的长期依赖性。另外，通过表征每个帧中的表达强度，重量分配机制也旨在加重不同节点的最终分类的输出。据我们所知，这是第一次在FER任务中使用GCN。我们在三个广泛使用的数据集（CK+，Oulu-Casia和MMI）上评估了我们的方法，以及一个具有挑战性的野生数据集AFEW8.0，实验结果表明，我们的方法与现有方法具有较高的性能。

Facial expression recognition (FER), aiming to classify the expression present in the facial image or video, has attracted a lot of research interests in the field of artificial intelligence and multimedia. In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression. However, existing methods directly utilize CNN-RNN or 3D CNN to extract the spatial-temporal features from different facial units, instead of concentrating on a certain region during expression variation capturing, which leads to limited performance in FER. In our paper, we introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based FER. First, the GCN layer is utilized to learn more significant facial expression features which concentrate on certain regions after sharing information between extracted CNN features of nodes. Then, a LSTM layer is applied to learn long-term dependencies among the GCN learned features to model the variation. In addition, a weight assignment mechanism is also designed to weight the output of different nodes for final classification by characterizing the expression intensities in each frame. To the best of our knowledge, it is the first time to use GCN in FER task. We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0, and the experimental results demonstrate that our method has superior performance to existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题