FASTVC：使用非并行数据快速语音转换

论文标题

FASTVC：使用非并行数据快速语音转换

FastVC: Fast Voice Conversion with non-parallel data

论文作者

Mayor, Oriol Barbany, Cernak, Milos

论文摘要

本文介绍了FASTVC，这是一种快速语音转换（VC）的端到端模型。提出的模型可以将任意长度的语音从多个源说话者转换为多个目标扬声器。 FASTVC基于有条件的自动编码器（AE），该自动编码器（AE）对非并行数据训练，根本不需要注释。该模型的潜在表示被证明是无关的，并且与音素相似，这是VC系统的理想功能。尽管当前的VC系统主要集中于实现最高的整体语音质量，但本文试图平衡运行系统所需的资源的开发。尽管提出的模型的结构很简单，但就自然性而言，它在跨语性任务上的风险投资挑战2020基线的表现。

This paper introduces FastVC, an end-to-end model for fast Voice Conversion (VC). The proposed model can convert speech of arbitrary length from multiple source speakers to multiple target speakers. FastVC is based on a conditional AutoEncoder (AE) trained on non-parallel data and requires no annotations at all. This model's latent representation is shown to be speaker-independent and similar to phonemes, which is a desirable feature for VC systems. While the current VC systems primarily focus on achieving the highest overall speech quality, this paper tries to balance the development concerning resources needed to run the systems. Despite the simple structure of the proposed model, it outperforms the VC Challenge 2020 baselines on the cross-lingual task in terms of naturalness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题