学习可控制的复音音乐的可解释表示形式

论文标题

学习可控制的复音音乐的可解释表示形式

Learning Interpretable Representation for Controllable Polyphonic Music Generation

论文作者

Wang, Ziyu, Wang, Dingsu, Zhang, Yixiao, Xia, Gus

论文摘要

尽管深层生成模型已成为算法组成的主要方法，但控制生成过程仍然是一个具有挑战性的问题，因为大多数深度学习模型的潜在变量缺乏良好的解释性。受内容风格的脱节想法的启发，我们在VAE框架下设计了一种新颖的体系结构，该建筑有效地学习了两个可解释的多形音乐的潜在因素：和弦和纹理。当前的模型着重于学习8束长钢琴组成部分。我们表明，这种和弦文本的解开提供了可控的生成途径，导致广泛的应用，包括构图样式转移，纹理变化和伴奏布置。客观评估和主观评估都表明，我们的方法实现了成功的分离和高质量控制的音乐发电。

While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题