基于两阶段分类和数据增强的设备射击声音场景分类

论文标题

基于两阶段分类和数据增强的设备射击声音场景分类

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

论文作者

Hu, Hu, Yang, Chao-Han Huck, Xia, Xianjun, Bai, Xue, Tang, Xin, Wang, Yajian, Niu, Shutong, Chai, Li, Li, Juanjuan, Zhu, Hongning, Bao, Feng, Zhao, Yuanjun, Siniscalchi, Sabato Marco, Wang, Yannan, Du, Jun, Lee, Chin-Hui

论文摘要

在这份技术报告中，我们提出了四个小组的共同努力，即GT，USTC，Tencent和UKE，以解决DCASE 2020挑战中的任务1-声学场景分类（ASC）。任务1包括两个不同的子任务：（i）任务1A重点介绍了带有多个（真实和模拟）设备的音频信号的ASC，将数据分类为十个不同的细粒度类别，以及（ii）将数据分类为三个使用低复杂性解决方案的高级类别的任务1B关注点。对于任务1a，我们提出了一个新型的两阶段ASC系统，该系统在两个卷积神经网络（CNN）的临时得分组合中利用，分别根据三个类别和十个类别对声学输入进行分类。探索了四个不同的基于CNN的架构以实现两阶段的分类器，还研究了几种数据增强技术。对于任务1B，我们利用一种量化方法来降低我们两种最高准确性三级基于CNN的体系结构的复杂性。在任务1A开发数据集上，使用我们最佳的单个分类器和数据增强来达到76.9 \％的ASC精度。然后，通过我们的两阶段ASC分类器的最终模型融合来达到81.9 \％的精度。在任务1B开发数据集上，我们达到的精度为96.7 \％，模型尺寸小于500kb。代码可用：https：//github.com/mihawkhu/dcase2020_task1。

In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes, and (ii) Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions. For Task 1a, we propose a novel two-stage ASC system leveraging upon ad-hoc score combination of two convolutional neural networks (CNNs), classifying the acoustic input according to three classes, and then ten classes, respectively. Four different CNN-based architectures are explored to implement the two-stage classifiers, and several data augmentation techniques are also investigated. For Task 1b, we leverage upon a quantization method to reduce the complexity of two of our top-accuracy three-classes CNN-based architectures. On Task 1a development data set, an ASC accuracy of 76.9\% is attained using our best single classifier and data augmentation. An accuracy of 81.9\% is then attained by a final model fusion of our two-stage ASC classifiers. On Task 1b development data set, we achieve an accuracy of 96.7\% with a model size smaller than 500KB. Code is available: https://github.com/MihawkHu/DCASE2020_task1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题