语音转换挑战2020：语音内半平行和跨语言语音转换

论文标题

语音转换挑战2020：语音内半平行和跨语言语音转换

Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

论文作者

Zhao, Yi, Huang, Wen-Chin, Tian, Xiaohai, Yamagishi, Junichi, Das, Rohan Kumar, Kinnunen, Tomi, Ling, Zhenhua, Toda, Tomoki

论文摘要

语音转换挑战是举办的一个两年一次的科学事件，旨在比较和理解在公共数据集中构建的不同语音转换（VC）系统。在2020年，我们组织了第三版的挑战，并为两个任务（语法内半平行和跨语言VC）构建和分发了一个新数据库。经过两个月的挑战期，我们收到了33项提交，其中包括在数据库上构建的3个基线。从众包听力测试的结果中，我们观察到，由于先进的深度学习方法，VC方法迅速发展。尤其是，几个系统的扬声器相似性得分在语言内半平行VC任务中与目标扬声器一样高。但是，我们确认他们都没有实现人类水平的自然性。正如预期的那样，跨语性转换任务是一项更加困难的任务，整体自然性和相似性得分低于语言内转换任务的任务。但是，我们观察到令人鼓舞的结果，最佳系统的MOS得分高于4.0。我们还展示了一些其他分析结果，以帮助更好地理解跨语义VC。

The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, including 3 baselines built on the database. From the results of crowd-sourced listening tests, we observed that VC methods have progressed rapidly thanks to advanced deep learning methods. In particular, speaker similarity scores of several systems turned out to be as high as target speakers in the intra-lingual semi-parallel VC task. However, we confirmed that none of them have achieved human-level naturalness yet for the same task. The cross-lingual conversion task is, as expected, a more difficult task, and the overall naturalness and similarity scores were lower than those for the intra-lingual conversion task. However, we observed encouraging results, and the MOS scores of the best systems were higher than 4.0. We also show a few additional analysis results to aid in understanding cross-lingual VC better.

下载PDF全文

下载文献需遵守相关版权规定

论文标题