论文标题
使用条件周期一致的对抗网络,多一通的语音转换
Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks
论文作者
论文摘要
语音转换(VC)是指在不改变语言内容的情况下改变发音的说话者特征。许多关于语音转换的工作需要具有非常昂贵的并行培训数据。最近,不需要并行训练数据的周期矛盾的对抗网络(Cyclegan)已应用于语音转换,显示了最新的性能。但是,基于自行车的语音转换只能用于一对扬声器,即两个扬声器之间的一对一语音转换。在本文中,我们通过将网络调节在扬声器上进行调节。结果,所提出的方法可以使用单个生成对抗网络(GAN)在多个扬声器之间进行多个一对多的语音转换。与为每对扬声器构建多个自行车手的人相比,提出的方法降低了计算和空间成本,而不会损害转换后的声音的声音质量。使用VCC2018语料库的实验结果证实了该方法的效率。
Voice conversion (VC) refers to transforming the speaker characteristics of an utterance without altering its linguistic contents. Many works on voice conversion require to have parallel training data that is highly expensive to acquire. Recently, the cycle-consistent adversarial network (CycleGAN), which does not require parallel training data, has been applied to voice conversion, showing the state-of-the-art performance. The CycleGAN based voice conversion, however, can be used only for a pair of speakers, i.e., one-to-one voice conversion between two speakers. In this paper, we extend the CycleGAN by conditioning the network on speakers. As a result, the proposed method can perform many-to-many voice conversion among multiple speakers using a single generative adversarial network (GAN). Compared to building multiple CycleGANs for each pair of speakers, the proposed method reduces the computational and spatial cost significantly without compromising the sound quality of the converted voice. Experimental results using the VCC2018 corpus confirm the efficiency of the proposed method.