情绪混合的语音综合

论文标题

情绪混合的语音综合

Speech Synthesis with Mixed Emotions

论文作者

Zhou, Kun, Sisman, Berrak, Rana, Rajib, Schuller, B. W., Li, Haizhou

论文摘要

情感语音综合旨在综合人类的声音，并具有各种情感影响。当前的研究主要集中于模仿属于特定情感类型的平均风格。在本文中，我们试图在运行时与情感混合在一起。我们提出了一种新颖的表述，可以衡量不同情绪的语音样本之间的相对差异。然后，我们将公式纳入序列到序列情感文本到语音框架中。在培训期间，该框架不仅明确地表征了情感风格，而且还通过用其他情感量化差异来探讨情感的有序性质。在运行时，我们通过手动定义情感属性向量来控制模型以产生所需的情绪混合物。客观和主观评估验证了所提出的框架的有效性。据我们所知，这项研究是关于言语中混合情绪的建模，综合和评估混合情绪的首次研究。

Emotional speech synthesis aims to synthesize human voices with various emotional effects. The current studies are mostly focused on imitating an averaged style belonging to a specific emotion type. In this paper, we seek to generate speech with a mixture of emotions at run-time. We propose a novel formulation that measures the relative difference between the speech samples of different emotions. We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework. During the training, the framework does not only explicitly characterize emotion styles, but also explores the ordinal nature of emotions by quantifying the differences with other emotions. At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector. The objective and subjective evaluations have validated the effectiveness of the proposed framework. To our best knowledge, this research is the first study on modelling, synthesizing, and evaluating mixed emotions in speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题