语音转换及其挑战的概述：从统计建模到深度学习

论文标题

语音转换及其挑战的概述：从统计建模到深度学习

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning

论文作者

Sisman, Berrak, Yamagishi, Junichi, King, Simon, Li, Haizhou

论文摘要

说话者身份是人类言论的重要特征之一。在语音转换中，我们将扬声器的身份从一个变为另一个，同时使语言内容保持不变。语音转换涉及多种语音处理技术，例如语音分析，频谱转换，韵律转换，扬声器表征和作业编码。随着理论和实践的最新进展，我们现在能够以高扬声器相似性生产类似人类的语音质量。在本文中，我们从深度学习的统计方法中提供了语音转换技术的最先进及其性能评估方法的全面概述，并讨论了他们的承诺和局限性。我们还将报告最近的语音转换挑战（VCC），即当前技术状态的性能，并提供了可用资源的语音转换研究。

Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic content unchanged. Voice conversion involves multiple speech processing techniques, such as speech analysis, spectral conversion, prosody conversion, speaker characterization, and vocoding. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. In this paper, we provide a comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning, and discuss their promise and limitations. We will also report the recent Voice Conversion Challenges (VCC), the performance of the current state of technology, and provide a summary of the available resources for voice conversion research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题