通过正交正规化学习分离功能

论文标题

通过正交正规化学习分离功能

Learning Decoupling Features Through Orthogonality Regularization

论文作者

Wang, Li, Gu, Rongzhi, Zhuang, Weiji, Gao, Peng, Wang, Yujun, Zou, Yuexian

论文摘要

关键字斑点（KWS）和说话者验证（SV）是语音应用程序中的两个重要任务。研究表明，最先进的KWS和SV模型是使用不同数据集独立培训的，因为他们希望学习独特的声学特征。但是，人类可以同时区分语言内容和说话者的身份。我们认为，探索一种可以有效提取共同特征的方法在解耦特定于任务特定功能的过程中，这一点很重要。考虑到这一点，开发了一个具有相同网络结构的两个分支深网（KWS分支和SV分支），并提出了一种新颖的解耦功能学习方法，以同时预期扬声器invariant关键字表示和关键字invariant speaker表示kWs和SV的性能。实验是在Google语音命令数据集（GSCD）上进行的。结果表明，正交性正规化有助于网络在KWS和SV上分别实现1.31％和1.87％的SOTA EER。

Keyword spotting (KWS) and speaker verification (SV) are two important tasks in speech applications. Research shows that the state-of-art KWS and SV models are trained independently using different datasets since they expect to learn distinctive acoustic features. However, humans can distinguish language content and the speaker identity simultaneously. Motivated by this, we believe it is important to explore a method that can effectively extract common features while decoupling task-specific features. Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively. Experiments are conducted on Google Speech Commands Dataset (GSCD). The results demonstrate that the orthogonality regularization helps the network to achieve SOTA EER of 1.31% and 1.87% on KWS and SV, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题