多模型DPRNN：高精度源计数和分离

论文标题

多模型DPRNN：高精度源计数和分离

Multi-Decoder DPRNN: High Accuracy Source Counting and Separation

论文作者

Zhu, Junzhe, Yeh, Raymond, Hasegawa-Johnson, Mark

论文摘要

我们提出了一种可端到端的可训练方法，以单渠道语音分离，而扬声器数量未知。我们的方法扩展了Mulcat源分离主链，并带有其他输出头：推断扬声器数量的计数头，以及用于重建原始信号的解码器头。除了模型之外，我们还提出了一个关于如何使用可变数量的说话者评估源分离的度量。具体而言，我们清除了有关如何评估质量的问题，而当地面真相或扬声器比模型预测的扬声器更少时。我们在WSJ0-MIX数据集上评估了我们的方法，最多可混合五个扬声器。我们证明，我们的方法在计算说话者的数量方面优于最先进的方法，并且在重建信号的质量上仍然具有竞争力。

We propose an end-to-end trainable approach to single-channel speech separation with unknown number of speakers. Our approach extends the MulCat source separation backbone with additional output heads: a count-head to infer the number of speakers, and decoder-heads for reconstructing the original signals. Beyond the model, we also propose a metric on how to evaluate source separation with variable number of speakers. Specifically, we cleared up the issue on how to evaluate the quality when the ground-truth hasmore or less speakers than the ones predicted by the model. We evaluate our approach on the WSJ0-mix datasets, with mixtures up to five speakers. We demonstrate that our approach outperforms state-of-the-art in counting the number of speakers and remains competitive in quality of reconstructed signals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题