论文标题
通过层次结构限制学习长期的音乐表示
Learning long-term music representations via hierarchical contextual constraints
论文作者
论文摘要
学习象征性音乐表示形式,尤其是具有概率解释的分散表示,已被证明是有益于音乐理解和产生的。但是,大多数模型仅适用于短期音乐,而学习长期音乐表现仍然是一项艰巨的任务。我们已经看到了几项试图直接以端到端方式学习层次表示的研究,但是这些模型无法实现所需的结果,并且训练过程不稳定。在本文中,我们提出了一种新颖的方法,可以通过上下文限制来学习长期符号音乐表述。首先,我们使用对比度学习通过将其从短期表示(由现成模型提取)约束其差异来预先培训。然后,我们通过层次预测模型微调长期表示,以使良好的长期表示(例如,8键表示)可以重建相应的短期短期(例如,在8杆范围内的2杆表示)。实验表明,我们的方法稳定了训练和微调步骤。此外,设计的上下文约束使重建和分离有益于大大优于基准。
Learning symbolic music representations, especially disentangled representations with probabilistic interpretations, has been shown to benefit both music understanding and generation. However, most models are only applicable to short-term music, while learning long-term music representations remains a challenging task. We have seen several studies attempting to learn hierarchical representations directly in an end-to-end manner, but these models have not been able to achieve the desired results and the training process is not stable. In this paper, we propose a novel approach to learn long-term symbolic music representations through contextual constraints. First, we use contrastive learning to pre-train a long-term representation by constraining its difference from the short-term representation (extracted by an off-the-shelf model). Then, we fine-tune the long-term representation by a hierarchical prediction model such that a good long-term representation (e.g., an 8-bar representation) can reconstruct the corresponding short-term ones (e.g., the 2-bar representations within the 8-bar range). Experiments show that our method stabilizes the training and the fine-tuning steps. In addition, the designed contextual constraints benefit both reconstruction and disentanglement, significantly outperforming the baselines.