音乐源分离带有乐队分裂的RNN

论文标题

音乐源分离带有乐队分裂的RNN

Music Source Separation with Band-split RNN

论文作者

Luo, Yi, Yu, Jianwei

论文摘要

由于新型神经网络体系结构和培训管道的发展，近年来音乐源分离（MSS）模型的性能得到了极大的改进。但是，最新的MSS模型设计主要是由其他音频处理任务或其他研究领域引起的，而音乐信号的内在特征和模式并未完全发现。在本文中，我们提出了一个频率域模型，该模型将混合物的光谱分解为子带，并执行交织的带级和序列级建模。子带带宽的选择可以通过对目标源特征的先验知识或专家知识来确定，以便优化某种类型的目标乐器的性能。为了更好地利用未标记的数据，我们还描述了半监督模型的登录管道，该管道可以进一步改善模型的性能。实验结果表明，仅在MUSDB18-HQ数据集上接受培训的BSRNN显着超过了2021年音乐混合（MDX）挑战中的几个顶级模型，而半监督的填充阶段进一步提高了所有四个仪器轨道的性能。

The performance of music source separation (MSS) models has been greatly improved in recent years thanks to the development of novel neural network architectures and training pipelines. However, recent model designs for MSS were mainly motivated by other audio processing tasks or other research fields, while the intrinsic characteristics and patterns of the music signals were not fully discovered. In this paper, we propose band-split RNN (BSRNN), a frequency-domain model that explictly splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling. The choices of the bandwidths of the subbands can be determined by a priori knowledge or expert knowledge on the characteristics of the target source in order to optimize the performance on a certain type of target musical instrument. To better make use of unlabeled data, we also describe a semi-supervised model finetuning pipeline that can further improve the performance of the model. Experiment results show that BSRNN trained only on MUSDB18-HQ dataset significantly outperforms several top-ranking models in Music Demixing (MDX) Challenge 2021, and the semi-supervised finetuning stage further improves the performance on all four instrument tracks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题