通过层次结构限制学习长期的音乐表示

论文标题

通过层次结构限制学习长期的音乐表示

Learning long-term music representations via hierarchical contextual constraints

论文作者

Wei, Shiqi, Xia, Gus

论文摘要

学习象征性音乐表示形式，尤其是具有概率解释的分散表示，已被证明是有益于音乐理解和产生的。但是，大多数模型仅适用于短期音乐，而学习长期音乐表现仍然是一项艰巨的任务。我们已经看到了几项试图直接以端到端方式学习层次表示的研究，但是这些模型无法实现所需的结果，并且训练过程不稳定。在本文中，我们提出了一种新颖的方法，可以通过上下文限制来学习长期符号音乐表述。首先，我们使用对比度学习通过将其从短期表示（由现成模型提取）约束其差异来预先培训。然后，我们通过层次预测模型微调长期表示，以使良好的长期表示（例如，8键表示）可以重建相应的短期短期（例如，在8杆范围内的2杆表示）。实验表明，我们的方法稳定了训练和微调步骤。此外，设计的上下文约束使重建和分离有益于大大优于基准。

Learning symbolic music representations, especially disentangled representations with probabilistic interpretations, has been shown to benefit both music understanding and generation. However, most models are only applicable to short-term music, while learning long-term music representations remains a challenging task. We have seen several studies attempting to learn hierarchical representations directly in an end-to-end manner, but these models have not been able to achieve the desired results and the training process is not stable. In this paper, we propose a novel approach to learn long-term symbolic music representations through contextual constraints. First, we use contrastive learning to pre-train a long-term representation by constraining its difference from the short-term representation (extracted by an off-the-shelf model). Then, we fine-tune the long-term representation by a hierarchical prediction model such that a good long-term representation (e.g., an 8-bar representation) can reconstruct the corresponding short-term ones (e.g., the 2-bar representations within the 8-bar range). Experiments show that our method stabilizes the training and the fine-tuning steps. In addition, the designed contextual constraints benefit both reconstruction and disentanglement, significantly outperforming the baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题