乐谱，音频和歌词无监督的生成对抗对准表示

论文标题

乐谱，音频和歌词无监督的生成对抗对准表示

Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics

论文作者

Zeng, Donghuo, Yu, Yi, Oyama, Keizo

论文摘要

乐谱，音频和歌词是写歌期间的三种主要方式。在本文中，我们提出了一种无监督的生成对抗性对准表示（UGAAR）模型，以学习在三种主要的音乐形式中共享的深层歧视性表示：乐谱音乐，歌词和音频，在其中共同培训了三个分支的深度神经网络建筑。特别是，提出的模型可以通过学习潜在共享子空间中的相关性来将音频和乐谱之间的牢固关系转移到音频式和乐谱对。我们应用音频和乐谱的CCA组件来建立新的地面真相。生成（G）模型了解了两对转移对夫妇的相关性，以生成新的音频表，以挑战判别（D）模型。歧视模型旨在区分生成模型或地面真相的输入。这两个模型同时以对抗性方式训练，以增强深度对齐表示学习的能力。我们的实验结果表明，我们提出的UGAAR在乐谱，音频和歌词之间进行对齐方式学习的可行性。

Sheet music, audio, and lyrics are three main modalities during writing a song. In this paper, we propose an unsupervised generative adversarial alignment representation (UGAAR) model to learn deep discriminative representations shared across three major musical modalities: sheet music, lyrics, and audio, where a deep neural network based architecture on three branches is jointly trained. In particular, the proposed model can transfer the strong relationship between audio and sheet music to audio-lyrics and sheet-lyrics pairs by learning the correlation in the latent shared subspace. We apply CCA components of audio and sheet music to establish new ground truth. The generative (G) model learns the correlation of two couples of transferred pairs to generate new audio-sheet pair for a fixed lyrics to challenge the discriminative (D) model. The discriminative model aims at distinguishing the input which is from the generative model or the ground truth. The two models simultaneously train in an adversarial way to enhance the ability of deep alignment representation learning. Our experimental results demonstrate the feasibility of our proposed UGAAR for alignment representation learning among sheet music, audio, and lyrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题