语音转换挑战2020：级联ASR和TTS的序列至序列基线

论文标题

语音转换挑战2020：级联ASR和TTS的序列至序列基线

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

论文作者

Huang, Wen-Chin, Hayashi, Tomoki, Watanabe, Shinji, Toda, Tomoki

论文摘要

本文介绍了2020年语音转换挑战（VCC）的序列到序列（SEQ2SEQ）基线系统。我们考虑一种天真的语音转换方法（VC），即首先使用自动语音识别（ASR）模型转录输入语音，然后使用转录来使用文本到文本启动（TTTS）模型来生成目标。我们通过使用ESPNET，开源端到端的语音处理工具包以及社区提供的许多精心配置的预审预告型模型，在序列到序列（SEQ2SEQ）框架下重新访问此方法。官方评估结果表明，在转换相似性方面，我们的系统在参与系统中名列前茅，这表明了SEQ2SEQ模型转换扬声器身份的有希望的能力。该实现是在以下位置进行的：https：//github.com/espnet/espnet/tree/master/master/egs/vcc20。

This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC), which is to first transcribe the input speech with an automatic speech recognition (ASR) model, followed using the transcriptions to generate the voice of the target with a text-to-speech (TTS) model. We revisit this method under a sequence-to-sequence (seq2seq) framework by utilizing ESPnet, an open-source end-to-end speech processing toolkit, and the many well-configured pretrained models provided by the community. Official evaluation results show that our system comes out top among the participating systems in terms of conversion similarity, demonstrating the promising ability of seq2seq models to convert speaker identity. The implementation is made open-source at: https://github.com/espnet/espnet/tree/master/egs/vcc20.

下载PDF全文

下载文献需遵守相关版权规定

论文标题