论文标题

语音活动检测的时间和频域的双重注意

Dual Attention in Time and Frequency Domain for Voice Activity Detection

论文作者

Lee, Joohyung, Jung, Youngmoon, Kim, Hoirin

论文摘要

语音活动检测(VAD)在低信噪比(SNR)环境中是一项具有挑战性的任务,尤其是在非平稳噪声中。为了解决这个问题,我们提出了一个可以集成在长期记忆(LSTM)中的新型注意模块。我们提出的注意模块完善了每个LSTM层的隐藏状态,以便可以适应时间和频域。实验是使用Aurora 4数据库在各种嘈杂条件上进行的。我们提出的方法在ROC曲线(AUC)下获得了95.58%的面积,与基线相比,相对改善22.05%,参数数量仅增加2.44%。此外,我们利用焦点损失来减轻训练集中语音和非语音部分之间的失衡引起的性能降低。结果表明,与跨熵损失相比,局灶性损失可以改善各种不平衡情况下的性能。

Voice activity detection (VAD) is a challenging task in low signal-to-noise ratio (SNR) environment, especially in non-stationary noise. To deal with this issue, we propose a novel attention module that can be integrated in Long Short-Term Memory (LSTM). Our proposed attention module refines each LSTM layer's hidden states so as to make it possible to adaptively focus on both time and frequency domain. Experiments are conducted on various noisy conditions using Aurora 4 database. Our proposed method obtains the 95.58 % area under the ROC curve (AUC), achieving 22.05 % relative improvement compared to baseline, with only 2.44 % increase in the number of parameters. Besides, we utilize focal loss for alleviating the performance degradation caused by imbalance between speech and non-speech sections in training sets. The results show that the focal loss can improve the performance in various imbalance situations compared to the cross entropy loss, a commonly used loss function in VAD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源