Librimix：用于可推广语音分离的开源数据集

论文标题

Librimix：用于可推广语音分离的开源数据集

LibriMix: An Open-Source Dataset for Generalizable Speech Separation

论文作者

Cosentino, Joris, Pariente, Manuel, Cornell, Samuele, Deleforge, Antoine, Vincent, Emmanuel

论文摘要

近年来，WSJ0-2MIX已成为单渠道语音分离的参考数据集。如今，大多数基于学习的语音分离模型都基于它。但是，最近的研究表明，当在其他类似的数据集上评估了在WSJ0-2MIX上训练的模型时，重要的性能下降。为了解决这个概括问题，我们创建了Librimix，这是WSJ0-2MIX的开源替代方案，并为其嘈杂的扩展而来，Wham！。基于LibrisPeech，Librimix由两种或三扬声器混合物组成，并结合了Wham！的环境噪声样本。使用Conv-Tasnet，我们可以在所有Librimix版本上实现竞争性能。为了跨数据集进行公平评估，我们引入了基于VCTK的第三个测试集，以供语音和WHAM！噪音。我们的实验表明，在干净和嘈杂的条件下，经过Librimix训练的模型的概括误差较小。为了在更现实的，类似对话的场景中进行评估，我们还发布了Librimix测试集的稀疏重叠版本。

In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!. Using Conv-TasNet, we achieve competitive performance on all LibriMix versions. In order to fairly evaluate across datasets, we introduce a third test set based on VCTK for speech and WHAM! for noise. Our experiments show that the generalization error is smaller for models trained with LibriMix than with WHAM!, in both clean and noisy conditions. Aiming towards evaluation in more realistic, conversation-like scenarios, we also release a sparsely overlapping version of LibriMix's test set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题