论文标题

面部表情识别和图像描述在越南

Facial Expression Recognition and Image Description Generation in Vietnamese

论文作者

Lam, Khang Nhut, Nguyen, Kim-Ngoc Thi, Nguy, Loc Huu, Kalita, Jugal

论文摘要

本文讨论了面部表达识别模型和描述生成模型,以构建图像中人的图像和面部表情的描述性句子。我们的研究表明,Yolov5比传统的CNN获得了KDEF数据集上所有情绪的传统CNN的成果。特别是,CNN和Yolov5模型的精度分别为0.853和0.938。使用VGG16与LSTM模型编码的描述一起提出了用于基于合并体系结构的图像描述的模型。 Yolov5还用于识别图像中对象的主要颜色,并在必要时纠正生成的描述中的颜色单词。如果描述包含涉及一个人的单词,我们会认识到图像中人的情感。最后,我们结合了所有模型的结果,以创建描述图像中视觉内容和人类情感的句子。越南语中FlickR8K数据集的实验结果实现了BLEU-1,BLEU-2,BLEU-3,BLEU-4分数为0.628; 0.425; 0.280;和0.174。

This paper discusses a facial expression recognition model and a description generation model to build descriptive sentences for images and facial expressions of people in images. Our study shows that YOLOv5 achieves better results than a traditional CNN for all emotions on the KDEF dataset. In particular, the accuracies of the CNN and YOLOv5 models for emotion recognition are 0.853 and 0.938, respectively. A model for generating descriptions for images based on a merged architecture is proposed using VGG16 with the descriptions encoded over an LSTM model. YOLOv5 is also used to recognize dominant colors of objects in the images and correct the color words in the descriptions generated if it is necessary. If the description contains words referring to a person, we recognize the emotion of the person in the image. Finally, we combine the results of all models to create sentences that describe the visual content and the human emotions in the images. Experimental results on the Flickr8k dataset in Vietnamese achieve BLEU-1, BLEU-2, BLEU-3, BLEU-4 scores of 0.628; 0.425; 0.280; and 0.174, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源