论文标题

低成本文本到语音系统的神经后过滤器的合成和自然语音不匹配的综合方法的周期性方法

A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System

论文作者

Wu, Yi-Chiao, Tobing, Patrick Lumban, Yasuhara, Kazuki, Matsunaga, Noriyuki, Ohtani, Yamato, Toda, Tomoki

论文摘要

由于神经网络的发展快速,基于神经的文本到语音(TTS)系统获得了非常高的言语。但是,庞大的标签语料库和高计算成本要求限制了小型公司或个人开发高保真TTS系统的可能性。另一方面,可以在基于神经的TTS系统中广泛采用的神经声码器,可以通过相对较小的未标记的语料库进行培训。因此,在本文中,我们探索了一个通用框架,用于使用神经声码器为低成本TTS系统开发神经后过滤器(NPF)。提出了一种周期性的方法来解决开发NPF的声学和时间不匹配(AM和TM)。已经进行了客观和主观评估,以证明AM和TM问题以及所提出的框架的有效性。

Neural-based text-to-speech (TTS) systems achieve very high-fidelity speech generation because of the rapid neural network developments. However, the huge labeled corpus and high computation cost requirements limit the possibility of developing a high-fidelity TTS system by small companies or individuals. On the other hand, a neural vocoder, which has been widely adopted for the speech generation in neural-based TTS systems, can be trained with a relatively small unlabeled corpus. Therefore, in this paper, we explore a general framework to develop a neural post-filter (NPF) for low-cost TTS systems using neural vocoders. A cyclical approach is proposed to tackle the acoustic and temporal mismatches (AM and TM) of developing an NPF. Both objective and subjective evaluations have been conducted to demonstrate the AM and TM problems and the effectiveness of the proposed framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源