Sagrnn：双耳扬声器分离的自动守门式RNN与室内提示保存

论文标题

Sagrnn：双耳扬声器分离的自动守门式RNN与室内提示保存

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

论文作者

Tan, Ke, Xu, Buye, Kumar, Anurag, Nachmani, Eliya, Adi, Yossi

论文摘要

大多数现有的基于深度学习的双耳扬声器分离系统都致力于为每个目标扬声器生成单声估计，因此不保留声音线索，对于人类听众来说，这对于进行声音的定位和侧向化至关重要。在这项研究中，我们解决了与说话者无关的双耳扬声器分离，并在估计的双耳信号中保留了室内线索。具体而言，我们通过融合了自我注意的机制和密集的连通性，扩展了新开发的封闭式复发性神经网络，以进行单膜分离。我们开发了一个端到端的多输入多输出系统，该系统将混合物的双耳波形直接映射到语音信号的混合物。实验结果表明，与最近的双耳分离方法相比，我们提出的方法的分离性能明显好得多。此外，我们的方法有效地保留了室内线索，从而提高了声音定位的准确性。

Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated binaural signals. Specifically, we extend a newly-developed gated recurrent neural network for monaural separation by additionally incorporating self-attention mechanisms and dense connectivity. We develop an end-to-end multiple-input multiple-output system, which directly maps from the binaural waveform of the mixture to those of the speech signals. The experimental results show that our proposed approach achieves significantly better separation performance than a recent binaural separation approach. In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题