基于标签的音乐检索的多模式度量学习

论文标题

基于标签的音乐检索的多模式度量学习

Multimodal Metric Learning for Tag-based Music Retrieval

论文作者

Won, Minz, Oramas, Sergio, Nieto, Oriol, Gouyon, Fabien, Serra, Xavier

论文摘要

基于标签的音乐检索对于有效浏览大型音乐库至关重要。因此，自动音乐标记已被积极探索，主要是作为一项分类任务，它具有固有的限制：固定的词汇。另一方面，度量学习可以通过使用验证的单词嵌入作为附带信息来启用灵活的词汇。此外，通过共同学习多模式嵌入空间，公制学习已经证明了其适合其他领域（例如文本对图像）跨模式检索任务的适用性。在本文中，我们研究了三个想法，以成功引入基于标签的音乐检索的多模式度量学习：精心制作的三胞胎采样，声学和文化音乐信息以及特定于领域的单词嵌入。我们的实验结果表明，提出的思想在定量和定性上增强了检索系统。此外，我们发布了MSD500，这是包含500个已清洁标签，7个手动注释标签类别和用户口味配置文件的百万曲数据集（MSD）的子集。

Tag-based music retrieval is crucial to browse large-scale music libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, metric learning has already proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively, and qualitatively. Furthermore, we release the MSD500, a subset of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually annotated tag categories, and user taste profiles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题