使用扬声器库存和估计语音的扬声器分离

论文标题

使用扬声器库存和估计语音的扬声器分离

Speaker Separation Using Speaker Inventories and Estimated Speech

论文作者

Wang, Peidong, Chen, Zhuo, Wang, DeLiang, Li, Jinyu, Gong, Yifan

论文摘要

我们建议使用扬声器清单和估计语音（SSUSIES）的扬声器分离，该框架利用了说话者的配置文件和估计的说话者分离的语音。 SSUSIE包含两种方法，使用扬声器清单（SSUSI）和使用估计的语音（SSUS）的扬声器分离。 SSUSI借助说话者库存来执行扬声器分离。通过结合置换不变训练（PIT）和语音提取的优势，SSUSI显着优于常规方法。 SSUE是一种广泛适用的技术，可以使用第一频繁分离的输出来大大改善扬声器的分离性能。我们在说话者分离和语音识别指标上评估模型。

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation. SSUSIES contains two methods, speaker separation using speaker inventories (SSUSI) and speaker separation using estimated speech (SSUES). SSUSI performs speaker separation with the help of speaker inventory. By combining the advantages of permutation invariant training (PIT) and speech extraction, SSUSI significantly outperforms conventional approaches. SSUES is a widely applicable technique that can substantially improve speaker separation performance using the output of first-pass separation. We evaluate the models on both speaker separation and speech recognition metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题