用于音频分类的卷积神经网络的合奏

论文标题

用于音频分类的卷积神经网络的合奏

An Ensemble of Convolutional Neural Networks for Audio Classification

论文作者

Nanni, Loris, Maguolo, Gianluca, Brahnam, Sheryl, Paci, Michelangelo

论文摘要

在本文中，介绍并在三个免费可用的音频分类数据集上介绍并测试了用于培训卷积神经网络（CNN）的分类器组合，这些分类器的组合用于训练卷积神经网络（CNN）：i）鸟类呼声，i），ii）猫声音，ii）环境声音分类数据集。比较将数据增强技术与不同信号表示形式结合在一起的最佳性能合奏，并显示出优于这些数据集文献中报告的最佳方法。此处提出的方法获得了最新的ESC-50数据集结果。据我们所知，这是最广泛的研究，研究了CNN的音频分类合奏。结果不仅表明CNN可以接受音频分类的培训，而且还可以使用不同的技术融合比独立分类器更好。

In this paper, ensembles of classifiers that exploit several data augmentation techniques and four signal representations for training Convolutional Neural Networks (CNNs) for audio classification are presented and tested on three freely available audio classification datasets: i) bird calls, ii) cat sounds, and iii) the Environmental Sound Classification dataset. The best performing ensembles combining data augmentation techniques with different signal representations are compared and shown to outperform the best methods reported in the literature on these datasets. The approach proposed here obtains state-of-the-art results in the widely used ESC-50 dataset. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. Results demonstrate not only that CNNs can be trained for audio classification but also that their fusion using different techniques works better than the stand-alone classifiers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题