低复杂性深度学习框架用于声学场景分类

论文标题

低复杂性深度学习框架用于声学场景分类

Low-complexity deep learning frameworks for acoustic scene classification

论文作者

Pham, Lam, Ngo, Dat, Jalali, Anahid, Schindler, Alexander

论文摘要

在本报告中，我们提出了用于声学场景分类（ASC）的低复杂性深度学习框架。所提出的框架可以分为四个主要步骤：前端频谱提取，在线数据增强，后端分类以及预测概率的晚融合。特别是，我们最初将音频记录转换为MEL，Gammatone和CQT频谱图。接下来，随后将随机裁剪，分类和混合的数据增强方法应用于生成增强的频谱图，然后才能归入基于深度学习的分类器中。最后，为了达到最佳性能，我们融合了从三个单个分类器获得的概率，这些概率是通过三种类型的频谱图独立训练的。我们在DCASE 2022任务1开发数据集上进行的实验已经满足了低复杂性的要求，并达到了60.1％的最佳分类精度，将Dcase基线提高了17.2％。

In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied to generate augmented spectrograms before being fed into deep learning based classifiers. Finally, to achieve the best performance, we fuse probabilities which obtained from three individual classifiers, which are independently-trained with three type of spectrograms. Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题