再次VC：使用激活指导和自适应实例归一化的单发语音转换

论文标题

再次VC：使用激活指导和自适应实例归一化的单发语音转换

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

论文作者

Chen, Yen-Hao, Wu, Da-Yi, Wu, Tsung-Han, Lee, Hung-yi

论文摘要

最近，语音转换（VC）已被广泛研究。许多VC系统使用基于解开的学习技术将说话者和语言内容信息与语音信号分开。随后，他们通过将扬声器信息更改为目标扬声器的信息来转换声音。为了防止说话者信息泄漏到内容嵌入中，先前的作品要么降低尺寸，要么将内容嵌入为强烈的信息瓶颈。这些机制以某种方式损害了综合质量。在这项工作中，我们再次提出了使用激活引导和自适应实例归一化的创新VC系统。再次，VC是一种基于自动编码器的模型，包括单个编码器和解码器。通过适当的激活作为内容嵌入内容的信息瓶颈，综合质量与转换语音的扬声器相似性之间的权衡得到了巨大改善。无论主观或客观评估如何，此单发VC系统都能获得最佳性能。

Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content information from a speech signal. Subsequently, they convert the voice by changing the speaker information to that of the target speaker. To prevent the speaker information from leaking into the content embeddings, previous works either reduce the dimension or quantize the content embedding as a strong information bottleneck. These mechanisms somehow hurt the synthesis quality. In this work, we propose AGAIN-VC, an innovative VC system using Activation Guidance and Adaptive Instance Normalization. AGAIN-VC is an auto-encoder-based model, comprising of a single encoder and a decoder. With a proper activation as an information bottleneck on content embeddings, the trade-off between the synthesis quality and the speaker similarity of the converted speech is improved drastically. This one-shot VC system obtains the best performance regardless of the subjective or objective evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题