很少使用3D-CNN拍摄独立的扬声器验证

论文标题

很少使用3D-CNN拍摄独立的扬声器验证

Few Shot Text-Independent speaker verification using 3D-CNN

论文作者

Mishra, Prateek

论文摘要

面部识别系统是人工智能的主要成功之一，在过去几年中已被大量使用。但是，图像并不是唯一的生物识别：音频是另一种可能用作现有识别系统的替代的生物识别。但是，独立于文本的音频数据并不总是用于诸如扬声器验证之类的任务，并且过去对与文本无关的扬声器验证的工作也没有完成，假设培训数据很少。因此，在本文中，我们提出了一种新颖的方法来使用很少的培训数据来验证声称的说话者的身份。为了实现这一目标，我们正在使用中心损失和扬声器偏见损失的暹罗神经网络。在Voxceleb1数据集上进行的实验表明，即使在很少有数据的训练中，提出的模型的精度也接近与文本无关的扬声器验证的最先进的模型

Facial recognition system is one of the major successes of Artificial intelligence and has been used a lot over the last years. But, images are not the only biometric present: audio is another possible biometric that can be used as an alternative to the existing recognition systems. However, the text-independent audio data is not always available for tasks like speaker verification and also no work has been done in the past for text-independent speaker verification assuming very little training data. Therefore, In this paper, we have proposed a novel method to verify the identity of the claimed speaker using very few training data. To achieve this we are using a Siamese neural network with center loss and speaker bias loss. Experiments conducted on the VoxCeleb1 dataset show that the proposed model accuracy even on training with very few data is near to the state of the art model on text-independent speaker verification

下载PDF全文

下载文献需遵守相关版权规定

论文标题