免费的通用声音分离数据有什么大惊小怪？

论文标题

免费的通用声音分离数据有什么大惊小怪？

What's All the FUSS About Free Universal Sound Separation Data?

论文作者

Wisdom, Scott, Erdogan, Hakan, Ellis, Daniel, Serizel, Romain, Turpault, Nicolas, Fonseca, Eduardo, Salamon, Justin, Seetharaman, Prem, Hershey, John

论文摘要

我们介绍了免费的通用声音分离（FUSS）数据集，这是一种新的语料库，用于将未知数量的声音的混合物与声音类型的开放域分开。该数据集由23小时的单源音频数据组成，这些音频数据来自357个类，这些数据用于创建一到四个来源的混合物。为了模拟混响，使用声室模拟器来产生具有频率依赖性反射壁的盒形房间的冲动响应。还提供了其他开源数据增强工具，可以生产新的混合物，这些混合物具有不同的资源和房间模拟组合。最后，我们基于改进的时间域卷积网络（TDCN ++）引入了开源基线分离模型，该模型可以分开混合物中可变数量的源。该模型可在具有两个到四个来源的混合物上实现9.8 dB的标准不变信号 - 噪声比率改善（SI-SNRI），同时重建具有35.5 dB绝对SI-SNR的单源输入。我们希望该数据集将降低新研究的障碍，并允许从其他机器学习域到声音分离挑战的新技术快速迭代和应用。

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio data drawn from 357 classes, which are used to create mixtures of one to four sources. To simulate reverberation, an acoustic room simulator is used to generate impulse responses of box shaped rooms with frequency-dependent reflective walls. Additional open-source data augmentation tools are also provided to produce new mixtures with different combinations of sources and room simulations. Finally, we introduce an open-source baseline separation model, based on an improved time-domain convolutional network (TDCN++), that can separate a variable number of sources in a mixture. This model achieves 9.8 dB of scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources, while reconstructing single-source inputs with 35.5 dB absolute SI-SNR. We hope this dataset will lower the barrier to new research and allow for fast iteration and application of novel techniques from other machine learning domains to the sound separation challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题