扬声器的归一化，以进行自我监督的语音情感识别

论文标题

扬声器的归一化，以进行自我监督的语音情感识别

Speaker Normalization for Self-supervised Speech Emotion Recognition

论文作者

Gat, Itai, Aronowitz, Hagai, Zhu, Weizhong, Morais, Edmilson, Hoory, Ron

论文摘要

很难获得大量的语音情感识别数据集，小型数据集可能包含偏见。反过来，基于深网的分类器又很容易利用这些偏见并找到诸如说话者特征之类的捷径。这些快捷方式通常会损害模型的概括能力。为了应对这一挑战，我们提出了一个基于梯度的对手学习框架，该框架可以学习语音情感识别任务，同时从功能表示中标准说话者特征。我们证明了我们的方法对依赖说话者的和说话者依赖的设置的功效，并在具有挑战性的Iemocap数据集上获得了新的最新结果。

Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题