使用深度可分离的卷积网络的几乎没有示意的扬声器识别，并注意

论文标题

使用深度可分离的卷积网络的几乎没有示意的扬声器识别，并注意

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

论文作者

Li, Yanxiong, Wang, Wucheng, Chen, Hao, Cao, Wenchang, Li, Wei, He, Qianhua

论文摘要

尽管很少有学习的学习吸引了图像和音频分类领域的广泛关注，但很少有几乎没有说话的扬声器识别的努力。在几次学习的任务中，过度拟合是一个棘手的问题，这主要是由于训练和测试条件之间的不匹配。在本文中，我们提出了几种示威者识别方法，可以减轻过度拟合的问题。在提出的方法中，通过典型的损耗函数训练了具有通道注意的深度可分离卷积网络的模型。实验数据集从三个公共语音语料库中提取：Aishell-2，Voxceleb1和Torgo。实验结果表明，在准确性和F得分方面，该提出的方法超过了很少射击者识别的最新方法。

Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfitting problem. In the proposed method, the model of a depthwise separable convolutional network with channel attention is trained with a prototypical loss function. Experimental datasets are extracted from three public speech corpora: Aishell-2, VoxCeleb1 and TORGO. Experimental results show that the proposed method exceeds state-of-the-art methods for few-shot speaker identification in terms of accuracy and F-score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题