论文标题
学习可控制的复音音乐的可解释表示形式
Learning Interpretable Representation for Controllable Polyphonic Music Generation
论文作者
论文摘要
尽管深层生成模型已成为算法组成的主要方法,但控制生成过程仍然是一个具有挑战性的问题,因为大多数深度学习模型的潜在变量缺乏良好的解释性。受内容风格的脱节想法的启发,我们在VAE框架下设计了一种新颖的体系结构,该建筑有效地学习了两个可解释的多形音乐的潜在因素:和弦和纹理。当前的模型着重于学习8束长钢琴组成部分。我们表明,这种和弦文本的解开提供了可控的生成途径,导致广泛的应用,包括构图样式转移,纹理变化和伴奏布置。客观评估和主观评估都表明,我们的方法实现了成功的分离和高质量控制的音乐发电。
While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.