浪潮：通过扬声器聚类的端到端语音分离

论文标题

浪潮：通过扬声器聚类的端到端语音分离

Wavesplit: End-to-End Speech Separation by Speaker Clustering

论文作者

Zeghidour, Neil, Grangier, David

论文摘要

我们引入WaveSplit，这是一个端到端源分离系统。从单个混合物中，该模型会渗透每个源的表示形式，然后估算每个源信号给定推断表示。该模型经过训练，可以从原始波形中共同执行这两个任务。浪潮通过聚类来注入一组源表示，这解决了基本置换分离问题。对于语音分离，与先前的工作相比，我们范围内序列的扬声器表示提供了长期，具有挑战性的记录。 WaveSplit重新定义了2或3个扬声器的干净混合物（WSJ0-2/3MIX）以及嘈杂和回荡的设置（WHAM/WHAMR）的最新混合物。我们还在最近的Librimix数据集上设置了一个新的基准测试。最后，我们表明，通过将胎儿和母体心率与单个腹部心电图分离，波动平台也适用于其他域。

We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the model infers a representation for each source and then estimates each source signal given the inferred representations. The model is trained to jointly perform both tasks from the raw waveform. Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation. For speech separation, our sequence-wide speaker representations provide a more robust separation of long, challenging recordings compared to prior work. Wavesplit redefines the state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2/3mix), as well as in noisy and reverberated settings (WHAM/WHAMR). We also set a new benchmark on the recent LibriMix dataset. Finally, we show that Wavesplit is also applicable to other domains, by separating fetal and maternal heart rates from a single abdominal electrocardiogram.

下载PDF全文

下载文献需遵守相关版权规定

论文标题