论文标题
使用卷积声学基序嵌入的歌手识别
Singer Identification Using Convolutional Acoustic Motif Embeddings
论文作者
论文摘要
Flamenco唱歌的特征是音高不稳定性,微分装饰,较大的颤音范围和高度的旋律变异性。这些音乐功能使Flamenco歌手的自动识别成为一项艰巨的计算任务。在本文中,我们提出了基于声学基序嵌入的Flamenco歌手识别的端到端管道。在采用的方法中,直接从原始音频信号获得的基本频率近似。这种近似降低了音频信号的高变异性,并允许使用顺序模式挖掘技术发现小旋律模式,从而创建了一个主题字典。然后使用几种声学特征来通过使用卷积架构来提取可变长度基序的固定长度嵌入。我们测试了弗拉门戈歌手识别任务中嵌入的质量,将我们的方法与以前的深度学习体系结构进行了比较,并研究了动机模式和声学特征在识别任务中的效果。结果表明,动机模式通过最大程度地减少要学习的信号的大小来识别弗拉门戈歌手而在识别识别任务无关的信息中起着至关重要的作用。深度学习架构的表现优于大规模音频分类问题中使用的密集模型。
Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult computational task. In this article we present an end-to-end pipeline for flamenco singer identification based on acoustic motif embeddings. In the approach taken, the fundamental frequency obtained directly from the raw audio signal is approximated. This approximation reduces the high variability of the audio signal and allows for small melodic patterns to be discovered using a sequential pattern mining technique, thus creating a dictionary of motifs. Several acoustic features are then used to extract fixed length embeddings of variable length motifs by using convolutional architectures. We test the quality of the embeddings in a flamenco singer identification task, comparing our approach with previous deep learning architectures, and study the effect of motivic patterns and acoustic features in the identification task. Results indicate that motivic patterns play a crucial role in identifying flamenco singers by minimizing the size of the signal to be learned, discarding information that is not relevant in the identification task. The deep learning architecture presented outperforms denser models used in large-scale audio classification problems.