论文标题
解决歌手身份证中伴奏的混杂
Addressing the confounds of accompaniments in singer identification
论文作者
论文摘要
识别歌手是许多应用程序的重要任务。但是,由于许多问题,任务仍然具有挑战性。一个主要问题与背景乐器音乐的混杂因素有关,这些因素与音乐制作中的人声混合在一起。歌手识别模型可以学会从歌曲的乐器部分中提取非声音相关的功能,如果歌手仅在某些音乐环境中唱歌(例如,类型)。因此,当歌手在看不见的环境中唱歌时,模型不能很好地概括。在本文中,我们试图解决这个问题。具体来说,我们采用开放源代码工具开放式工具,具有最先进的性能,可以将音乐的声音和器乐曲目分开。然后,我们研究了两种手段来训练歌手识别模型:通过仅从分开的人声中学习,或者从增强的数据集中,我们“随机混乱,并将不同歌曲的分离声音曲目和器乐曲目“随时随地”,以使歌手在不同背景下唱歌。我们还结合了从人声旋律轮廓中学到的旋律功能,以提高性能。在名为Artist20的基准数据集上的评估结果表明,这种数据增强方法大大提高了歌手识别的准确性。
Identifying singers is an important task with many applications. However, the task remains challenging due to many issues. One major issue is related to the confounding factors from the background instrumental music that is mixed with the vocals in music production. A singer identification model may learn to extract non-vocal related features from the instrumental part of the songs, if a singer only sings in certain musical contexts (e.g., genres). The model cannot therefore generalize well when the singer sings in unseen contexts. In this paper, we attempt to address this issue. Specifically, we employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music. We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data where we "shuffle-and-remix" the separated vocal tracks and instrumental tracks of different songs to artificially make the singers sing in different contexts. We also incorporate melodic features learned from the vocal melody contour for better performance. Evaluation results on a benchmark dataset called the artist20 shows that this data augmentation method greatly improves the accuracy of singer identification.