通过增强分类器星生成对抗网络的非平行语音转换

论文标题

通过增强分类器星生成对抗网络的非平行语音转换

Nonparallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks

论文作者

Kameoka, Hirokazu, Kaneko, Takuhiro, Tanaka, Kou, Hojo, Nobukatsu

论文摘要

我们先前提出了一种方法，该方法允许使用称为Stargan的生成对抗网络（GAN）的变体进行非平行语音转换（VC）。我们方法的主要特征（称为Stargan-VC）如下：首先，它不需要平行的话语，转录或时间对齐程序来进行语音生成器培训。其次，它可以使用单个发电机网络同时学习跨多个域的映射，因此可以完全利用从多个域收集的可用训练数据，以捕获所有域共有的潜在特征。第三，它可以足够快地生成转换的语音信号，以允许实时实现，并且只需要几分钟的培训示例即可产生合理的逼真的语音。在本文中，我们描述了Stargan的三种表述，包括新介绍的小说《 Stargan变体》，称为“增强的分类器Stargan（A-Stargan）”，并在非平行VC任务中进行比较。我们还将它们与几种基线方法进行了比较。

We previously proposed a method that allows for nonparallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN. The main features of our method, called StarGAN-VC, are as follows: First, it requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training. Second, it can simultaneously learn mappings across multiple domains using a single generator network and thus fully exploit available training data collected from multiple domains to capture latent features that are common to all the domains. Third, it can generate converted speech signals quickly enough to allow real-time implementations and requires only several minutes of training examples to generate reasonably realistic-sounding speech. In this paper, we describe three formulations of StarGAN, including a newly introduced novel StarGAN variant called "Augmented classifier StarGAN (A-StarGAN)", and compare them in a nonparallel VC task. We also compare them with several baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题