论文标题
TOT-TTS:基于语调模板的韵律控制系统
Into-TTS : Intonation Template Based Prosody Control System
论文作者
论文摘要
语调在传达演讲者的意图中起着重要作用。但是,当前的端到端TTS系统通常无法对适当的语调进行建模。为了减轻这个问题,我们提出了一种新颖的直观方法,以使用预定义的语调模板在不同语调中综合语音。在进行TTS模型培训之前,语音数据以无监督的方式分组为语调模板。将两个提出的模块添加到端到端TTS框架中:语调预测指标和一个语调编码器。语调预测变量为给定文本提供了合适的语调模板。连接到文本编码器输出的语调编码器综合了语音遵守请求的语调模板。本文的主要贡献是:(a)涵盖广泛用户的易于使用的语调控制系统; (b)在请求的语调中以改进的客观和主观评估来包装语音的表现更好; (c)合并一种预训练的语言模型。音频样本可在https://srtts.github.io/intotts上找到。
Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervised manner. Two proposed modules are added to the end-to-end TTS framework: an intonation predictor and an intonation encoder. The intonation predictor recommends a suitable intonation template to the given text. The intonation encoder, attached to the text encoder output, synthesizes speech abiding the requested intonation template. Main contributions of our paper are: (a) an easy-to-use intonation control system covering a wide range of users; (b) better performance in wrapping speech in a requested intonation with improved objective and subjective evaluation; and (c) incorporating a pre-trained language model for intonation modelling. Audio samples are available at https://srtts.github.io/IntoTTS.