论文标题
基于Acgan的数据增强与长期缩放图集成以进行声学场景分类
ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification
论文作者
论文摘要
在声学场景分类(ASC)中,声学特征在提取场景信息的提取中起着至关重要的作用,可以在不同的时间尺度上存储。此外,数据集的规模有限可能会导致一个有偏见的模型,而看不见的城市的记录和令人困惑的场景类别的记录表现不佳。为了克服这一点,我们提出了一个长期的小波功能,该功能需要较低的存储容量,并且与经典的MEL Filter Bank系数(FBANK)相比,可以更快,更准确地分类。可以使用类似于Fbank的预定义小波尺度提取此功能。此外,采用了基于具有辅助分类器(ACGAN)的生成对抗神经网络的新型数据增强方案,以改善ASC系统的概括。该方案包含ACGAN和样品滤波器,通过拆分数据集,训练ACGAN并随后过滤样品来迭代迭代。通过对声学场景和事件(DCASE)挑战的检测和分类在数据集上进行实验。 DCASE19数据集上的结果证明了与经典Fbank分类器相比,提出的技术的性能提高了。此外,拟议的融合系统在DCASE19竞赛中获得了第一名,并超过了Dcase17数据集中的最高精度。
In acoustic scene classification (ASC), acoustic features play a crucial role in the extraction of scene information, which can be stored over different time scales. Moreover, the limited size of the dataset may lead to a biased model with a poor performance for records from unseen cities and confusing scene classes. In order to overcome this, we propose a long-term wavelet feature that requires a lower storage capacity and can be classified faster and more accurately compared with classic Mel filter bank coefficients (FBank). This feature can be extracted with predefined wavelet scales similar to the FBank. Furthermore, a novel data augmentation scheme based on generative adversarial neural networks with auxiliary classifiers (ACGANs) is adopted to improve the generalization of the ASC systems. The scheme, which contains ACGANs and a sample filter, extends the database iteratively by splitting the dataset, training the ACGANs and subsequently filtering samples. Experiments were conducted on datasets from the Detection and Classification of Acoustic Scenes and Events (DCASE) challenges. The results on the DCASE19 dataset demonstrate the improved performance of the proposed techniques compared with the classic FBank classifier. Moreover, the proposed fusion system achieved first place in the DCASE19 competition and surpassed the top accuracies on the DCASE17 dataset.