MDCNN-SID：歌手识别的多尺度扩张卷积网络

论文标题

MDCNN-SID：歌手识别的多尺度扩张卷积网络

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

论文作者

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

论文摘要

大多数歌手识别方法都是在频域中处理的，这可能会导致光谱转换期间的信息丢失。在本文中，我们提出了一个端到端体系结构，而不是频域，该体系结构在波形域中解决了此问题。引入了基于多尺度扩张卷积神经网络（MDCNN）的编码器，以从原始音频信号中产生嵌入的波浪。具体而言，在提出的方法中使用扩张的卷积层来扩大接受场，旨在提取歌曲级特征。此外，骨干网络中的跳过连接集成了卷积层堆栈学到的多分辨率声学特征。然后，获得的波嵌入将传递到以下网络中以进行歌手识别。在实验中，所提出的方法在Artist20的基准数据集上实现了可比的性能，这大大改善了相关的工作。

Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation. In this paper, instead of the frequency domain, we propose an end-to-end architecture that addresses this problem in the waveform domain. An encoder based on Multi-scale Dilated Convolution Neural Networks (MDCNN) was introduced to generate wave embedding from the raw audio signal. Specifically, dilated convolution layers are used in the proposed method to enlarge the receptive field, aiming to extract song-level features. Furthermore, skip connection in the backbone network integrates the multi-resolution acoustic features learned by the stack of convolution layers. Then, the obtained wave embedding is passed into the following networks for singer identification. In experiments, the proposed method achieves comparable performance on the benchmark dataset of Artist20, which significantly improves related works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题