论文标题

使用基于麦塔射线的功能选择优化语音情感识别

Optimizing Speech Emotion Recognition using Manta-Ray Based Feature Selection

论文作者

Chattopadhyay, Soham, Dey, Arijit, Basak, Hritam

论文摘要

音频信号的情感识别被认为是信号处理中的一项艰巨任务,因为它可以被视为静态和动态分类任务的集合。通过使用机器学习模型的端到端特征提取和分类,对语音数据的情绪的识别已被严重依赖,尽管缺乏特征选择和优化限制了这些方法的性能。最近的研究表明,MEL频率曲线系数(MFCC)已成为最依赖的特征提取方法之一,尽管它以非常小的特征维度限制了分类的准确性。在本文中,我们建议通过使用不同现有特征提取方法提取的特征串联不仅可以提高分类精度,而且还可以扩大有效特征选择的可能性。在功能合并之前,我们已经使用了线性预测编码(LPC)(LPC)。此外,我们在语音情感识别任务中进行了新颖的麦塔射线优化应用,从而导致了该领域的最新结果。我们已经使用Savee和Emo-DB评估了模型的性能,这是两个公开可用的数据集。我们提出的方法在语音情绪分析中的表现优于所有现有方法,并在这两个数据集中产生了不错的结果,分类精度分别为97.06%和97.68%。

Emotion recognition from audio signals has been regarded as a challenging task in signal processing as it can be considered as a collection of static and dynamic classification tasks. Recognition of emotions from speech data has been heavily relied upon end-to-end feature extraction and classification using machine learning models, though the absence of feature selection and optimization have restrained the performance of these methods. Recent studies have shown that Mel Frequency Cepstral Coefficients (MFCC) have been emerged as one of the most relied feature extraction methods, though it circumscribes the accuracy of classification with a very small feature dimension. In this paper, we propose that the concatenation of features, extracted by using different existing feature extraction methods can not only boost the classification accuracy but also expands the possibility of efficient feature selection. We have used Linear Predictive Coding (LPC) apart from the MFCC feature extraction method, before feature merging. Besides, we have performed a novel application of Manta Ray optimization in speech emotion recognition tasks that resulted in a state-of-the-art result in this field. We have evaluated the performance of our model using SAVEE and Emo-DB, two publicly available datasets. Our proposed method outperformed all the existing methods in speech emotion analysis and resulted in a decent result in these two datasets with a classification accuracy of 97.06% and 97.68% respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源