与众包音乐评论桥接音乐和文字：主题音乐评论的顺序到序列框架

论文标题

与众包音乐评论桥接音乐和文字：主题音乐评论的顺序到序列框架

Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation

论文作者

Zhang, Peining, Guo, Junliang, Xu, Linli, You, Mu, Yin, Junming

论文摘要

我们考虑了自动生成音乐文本描述的新任务。与其他完善的文本生成任务（例如图像标题）相比，富裕的音乐和文本数据集的稀缺使其更具挑战性的任务。在本文中，我们利用众包音乐评论来构建一个新的数据集，并提出一个序列到序列模型以生成音乐的文本描述。更具体地说，我们将扩张的卷积层用作编码器的基本组成部分，基于内存的复发神经网络作为解码器。为了增强生成的文本的真实性和主题，我们进一步建议用歧视者和新的主题评估者对模型进行微调。为了衡量生成的文本的质量，我们还提出了两个新的评估指标，这些指标与人类评估比传统指标（例如BLEU）更加一致。实验结果验证了我们的模型能够在包含原始音乐的主题和内容信息的同时产生流利而有意义的评论。

We consider a novel task of automatically generating text descriptions of music. Compared with other well-established text generation tasks such as image caption, the scarcity of well-paired music and text datasets makes it a much more challenging task. In this paper, we exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music. More concretely, we use the dilated convolutional layer as the basic component of the encoder and a memory based recurrent neural network as the decoder. To enhance the authenticity and thematicity of generated texts, we further propose to fine-tune the model with a discriminator as well as a novel topic evaluator. To measure the quality of generated texts, we also propose two new evaluation metrics, which are more aligned with human evaluation than traditional metrics such as BLEU. Experimental results verify that our model is capable of generating fluent and meaningful comments while containing thematic and content information of the original music.

下载PDF全文

下载文献需遵守相关版权规定

论文标题