论文标题

辩护说话者认可的公制学习

In defence of metric learning for speaker recognition

论文作者

Chung, Joon Son, Huh, Jaesung, Mun, Seongkyu, Lee, Minjae, Heo, Hee Soo, Choe, Soyeon, Ham, Chiheon, Jung, Sunghwan, Lee, Bong-Jin, Han, Icksang

论文摘要

本文的目的是“开放式”扬声器对看不见的扬声器的认可,理想的嵌入应该能够将信息凝结到一个紧凑的话语级表示中,该言论级别的表示,具有较小的言论和较大的扬声器间言论。 对说话者认可的一种普遍的看法是,接受分类目标训练的网络的表现要优于公制学习方法。在本文中,我们对Voxceleb数据集中的说话者识别的最流行损失功能进行了广泛的评估。我们证明,与基于分类的损失相比,香草三胞胎损失表现出竞争性能,而接受我们建议的公制学习目标培训的损失优于最先进的方法。

The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper, we present an extensive evaluation of most popular loss functions for speaker recognition on the VoxCeleb dataset. We demonstrate that the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with our proposed metric learning objective outperform state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源