论文标题
无监督的跨域歌声转换
Unsupervised Cross-Domain Singing Voice Conversion
论文作者
论文摘要
我们提出了一个Wav-to-Wav生成模型,用于从任何身份中唱着语音转换的任务。我们的方法利用了一个声学模型,该模型训练了自动语音识别的任务,以及旋律提取的功能来驱动基于波形的发电机。所提出的生成架构对于说话者的身份不变,可以使用语音或唱歌来源对未标记的培训数据产生目标歌手的培训。该模型以端到端的方式进行了优化,没有任何手动监督,例如歌词,音符或平行样本。所提出的方法是完全跨的,可以实时生成音频。实验表明,我们的方法明显优于基线方法,同时比替代尝试产生令人信服的音频样本更好。
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator. The proposed generative architecture is invariant to the speaker's identity and can be trained to generate target singers from unlabeled training data, using either speech or singing sources. The model is optimized in an end-to-end fashion without any manual supervision, such as lyrics, musical notes or parallel samples. The proposed approach is fully-convolutional and can generate audio in real-time. Experiments show that our method significantly outperforms the baseline methods while generating convincingly better audio samples than alternative attempts.