论文标题
通过采样仪器的表达综合数据改善合唱音乐分离
Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments
论文作者
论文摘要
合唱音乐分离是指从混合音频中提取语音零件(例如女高音,中音,男高音和低音)的轨道的任务。由于版权问题和数据集收集困难,缺乏数据集阻碍了对该主题的研究,因为先前的工作只能在几分钟的合唱音乐数据上训练和评估模型。在本文中,我们研究了合成培训数据在真实合唱音乐上的源分离任务中的使用。我们做出了三个贡献:首先,我们提供了一条自动化管道,用于从可控选项中从采样仪表插件中合成合唱音乐数据以进行仪器表达性。这将从JSB Chorales数据集中产生一个8.2小时的合唱音乐数据集,并且可以轻松合成其他数据。其次,我们进行了一个实验,以评估与先前工作的可用合唱音乐分离数据集的多个分离模型。据我们所知,这是全面评估合唱音乐分离的第一个实验。第三,实验表明,合成的合唱数据具有足够的质量,可以提高模型在真实的音乐数据集上的性能。这为合唱音乐分离研究提供了其他实验统计和数据支持。
Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of synthesized training data for the source separation task on real choral music. We make three contributions: first, we provide an automated pipeline for synthesizing choral music data from sampled instrument plugins within controllable options for instrument expressiveness. This produces an 8.2-hour-long choral music dataset from the JSB Chorales Dataset and one can easily synthesize additional data. Second, we conduct an experiment to evaluate multiple separation models on available choral music separation datasets from previous work. To the best of our knowledge, this is the first experiment to comprehensively evaluate choral music separation. Third, experiments demonstrate that the synthesized choral data is of sufficient quality to improve the model's performance on real choral music datasets. This provides additional experimental statistics and data support for the choral music separation study.