通过持续时间知情的注意网络北京歌剧综合

论文标题

通过持续时间知情的注意网络北京歌剧综合

Peking Opera Synthesis via Duration Informed Attention Network

论文作者

Wu, Yusong, Li, Shengchen, Yu, Chengzhu, Lu, Heng, Weng, Chao, Zhang, Liqiang, Yu, Dong

论文摘要

自200年前以来，北京歌剧一直是中国表演艺术的最主要形式。北京的歌剧歌手通常通过在舞台上引入即兴表现和表现力表现出非常强大的个人风格，从而导致实际节奏和音调轮廓显着偏离原始音乐得分。这种不一致在音乐得分中北京歌剧唱歌声音综合构成了巨大的挑战。在这项工作中，我们建议根据持续时间知情的注意力网络（榴莲）框架来处理这个问题并合成从音乐得分中唱歌的表达性北京歌剧。为了解决节奏不匹配，Lagrange乘法器用于找到最佳的输出音素持续时间序列，并在音乐得分中给定音符持续时间的约束。至于音高轮廓不匹配，而不是直接从音乐得分中推断出来，我们采用了从真实的唱歌中产生的伪音乐得分，并在训练过程中将其作为输入。实验表明，使用提出的系统，我们可以用高质量的音色，音高和表达性合成北京的歌剧歌曲。

Peking Opera has been the most dominant form of Chinese performing art since around 200 years ago. A Peking Opera singer usually exhibits a very strong personal style via introducing improvisation and expressiveness on stage which leads the actual rhythm and pitch contour to deviate significantly from the original music score. This inconsistency poses a great challenge in Peking Opera singing voice synthesis from a music score. In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework. To tackle the rhythm mismatch, Lagrange multiplier is used to find the optimal output phoneme duration sequence with the constraint of the given note duration from music score. As for the pitch contour mismatch, instead of directly inferring from music score, we adopt a pseudo music score generated from the real singing and feed it as input during training. The experiments demonstrate that with the proposed system we can synthesize Peking Opera singing voice with high-quality timbre, pitch and expressiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题