论文标题
与挤压兴趣残留网络的声学场景分类
Acoustic Scene Classification with Squeeze-Excitation Residual Networks
论文作者
论文摘要
声学场景分类(ASC)是与机器侦听领域有关的问题,其目的是在描述场景位置的预定标签中对音频剪辑进行分类/标记音频剪辑(例如,公园,机场等)。 ASC的许多最先进的解决方案都结合了数据增强技术和模型合奏。但是,只有通过修改卷积神经网络(CNN)的体系结构,才能实现可观的改进。在这项工作中,我们提出了两个新型的挤压兴奋块,以提高基于剩余学习的基于CNN的ASC框架的准确性。挤压兴奋块的主要思想是独立学习空间和频道特征图,而不是像标准CNN一样共同学习。这通常是由一些全球分组运营商,线性运算符以及块的输入及其获得的关系之间的最终校准来实现的。实现此类操作员以及整个神经网络的块的行为可以根据块的输入,已建立的残留配置和所选的非线性激活来修改。该分析是使用Tau Urban声学场景2019数据集(https://zenodo.org/record/2589280)进行的。本文档中讨论的所有配置都超过了Dcase组织提出的基线的性能,提高了13%的百分比。反过来,本文提出的新型配置优于先前作品中提出的残差配置。
Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently instead of jointly as standard CNNs do. This is usually achieved by some global grouping operators, linear operators and a final calibration between the input of the block and its obtained relationships. The behavior of the block that implements such operators and, therefore, the entire neural network, can be modified depending on the input to the block, the established residual configurations and the selected non-linear activations. The analysis has been carried out using the TAU Urban Acoustic Scenes 2019 dataset (https://zenodo.org/record/2589280) presented in the 2019 edition of the DCASE challenge. All configurations discussed in this document exceed the performance of the baseline proposed by the DCASE organization by 13\% percentage points. In turn, the novel configurations proposed in this paper outperform the residual configurations proposed in previous works.