论文标题
伯特,他可以预测对比的重点吗?使用语言模型预测和控制神经TT的突出
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model
论文作者
论文摘要
最近的一些研究测试了变压器语言模型表示的使用来推断文本到语音综合(TTS)的韵律特征。尽管这些研究总体上探讨了韵律,但在这项工作中,我们专门研究了对对比对个人代词的预测。这是一项特别具有挑战性的任务,因为它通常需要语义,话语和/或务实的知识才能正确预测。我们收集包含对比焦点的话语语料库,并评估了BERT模型的准确性,该模型的准确性是在这些样本上预测量化的量化声学突出特征。我们还研究了过去的话语如何为该预测提供相关信息。此外,我们评估了以声音突出特征为条件的TTS模型中代词突出性的可控性。
Several recent studies have tested the use of transformer language model representations to infer prosodic features for text-to-speech synthesis (TTS). While these studies have explored prosody in general, in this work, we look specifically at the prediction of contrastive focus on personal pronouns. This is a particularly challenging task as it often requires semantic, discursive and/or pragmatic knowledge to predict correctly. We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples. We also investigate how past utterances can provide relevant information for this prediction. Furthermore, we evaluate the controllability of pronoun prominence in a TTS model conditioned on acoustic prominence features.