论文标题
浪潮:通过扬声器聚类的端到端语音分离
Wavesplit: End-to-End Speech Separation by Speaker Clustering
论文作者
论文摘要
我们引入WaveSplit,这是一个端到端源分离系统。从单个混合物中,该模型会渗透每个源的表示形式,然后估算每个源信号给定推断表示。该模型经过训练,可以从原始波形中共同执行这两个任务。浪潮通过聚类来注入一组源表示,这解决了基本置换分离问题。对于语音分离,与先前的工作相比,我们范围内序列的扬声器表示提供了长期,具有挑战性的记录。 WaveSplit重新定义了2或3个扬声器的干净混合物(WSJ0-2/3MIX)以及嘈杂和回荡的设置(WHAM/WHAMR)的最新混合物。我们还在最近的Librimix数据集上设置了一个新的基准测试。最后,我们表明,通过将胎儿和母体心率与单个腹部心电图分离,波动平台也适用于其他域。
We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the model infers a representation for each source and then estimates each source signal given the inferred representations. The model is trained to jointly perform both tasks from the raw waveform. Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation. For speech separation, our sequence-wide speaker representations provide a more robust separation of long, challenging recordings compared to prior work. Wavesplit redefines the state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2/3mix), as well as in noisy and reverberated settings (WHAM/WHAMR). We also set a new benchmark on the recent LibriMix dataset. Finally, we show that Wavesplit is also applicable to other domains, by separating fetal and maternal heart rates from a single abdominal electrocardiogram.