论文标题
学习音频概念的无监督层次结构
Learning Unsupervised Hierarchies of Audio Concepts
论文作者
论文摘要
音乐信号很难从其低级功能中解释,甚至可能不仅仅是图像:突出显示频谱图或图像的一部分通常不足以传达与人类真正相关的高级思想。在计算机视觉中,提议将概念学习调整为正确的抽象水平(例如,从X光片中检测临床概念)。这些方法尚未用于miR。 在本文中,我们将概念学习适应音乐领域,并具有特殊性。例如,音乐概念通常是非独立的,并且具有混合性质(例如类型,仪器,情绪),这与以前假定散布概念的作品不同。我们提出了一种从音频中学习许多音乐概念的方法,然后自动层次结构以揭露他们的相互关系。我们在音乐流服务的播放列表数据集上进行实验,并作为不同概念的一些注释示例。评估表明,采矿的层次结构与概念的两个基础层次结构(如果可用)以及一般情况下的概念相似性的代理来源。
Music signals are difficult to interpret from their low-level features, perhaps even more than images: e.g. highlighting part of a spectrogram or an image is often insufficient to convey high-level ideas that are genuinely relevant to humans. In computer vision, concept learning was therein proposed to adjust explanations to the right abstraction level (e.g. detect clinical concepts from radiographs). These methods have yet to be used for MIR. In this paper, we adapt concept learning to the realm of music, with its particularities. For instance, music concepts are typically non-independent and of mixed nature (e.g. genre, instruments, mood), unlike previous work that assumed disentangled concepts. We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships. We conduct experiments on datasets of playlists from a music streaming service, serving as a few annotated examples for diverse concepts. Evaluations show that the mined hierarchies are aligned with both ground-truth hierarchies of concepts -- when available -- and with proxy sources of concept similarity in the general case.