深度神经网络的多通道歌唱语音分离知情DOA受约束的CNMF

论文标题

深度神经网络的多通道歌唱语音分离知情DOA受约束的CNMF

Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CNMF

论文作者

Muñoz-Montoro, Antonio J., Carabias-Orti, Julio J., Politis, Archontis, Drossos, Konstantinos

论文摘要

这项工作解决了多通道源分离的问题，结合了两种强大的方法，多通道光谱分解与最近的单声学深学习（DL）光谱推断。通过掩盖者 - denoiser双网络（MAD Twinnet）估算了不同频道的单个源光谱，能够建模音乐作品的长期时间模式。单声源频谱图用于基于复杂的非阴性基质分解（CNMF）的空间协方差混合模型，该模型预测了每个源的空间特性。对所提出的框架进行了评估，该框架是使用大型多通道数据集唱歌语音分离的任务。实验结果表明，我们的DL+CNMF方法的表现优于基于单调DL的单个分离和多通道CNMF基线方法。

This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on Complex Non-Negative Matrix Factorization (CNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CNMF method outperforms both the individual monophonic DL-based separation and the multichannel CNMF baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题