论文标题
关于声学场景和声音事件的信息如何相互利益事件检测和场景分类任务
How Information on Acoustic Scenes and Sound Events Mutually Benefits Event Detection and Scene Classification Tasks
论文作者
论文摘要
声学场景分类(ASC)和声音事件检测(SED)是环境声音分析中的基本任务,并且已经提出了许多基于深度学习的方法。考虑到有关声学场景和声音事件的信息有助于SED和ASC相互共同,一些研究人员提出了通过多任务学习(MTL)对声学场景和声音事件的联合分析。但是,传统作品尚未详细研究声学场景和声音事件如何相互使SED和ASC受益。因此,我们通过使用基于梯度逆转层(GRL)或带有假标签的模型培训来研究信息对声学场景和声音事件对SED和ASC性能的影响。使用TUT声学场景2016/2017和TUT声音事件2016/2017获得的实验结果表明,有关声学场景和声音事件的信息分别用于检测声音事件并分类声音场景。此外,在将基于GRL和假标签的方法与单任务的ASC和SED方法进行比较后,发现基于单任务的方法可以实现更好的性能。该结果意味着,即使使用基于单任务的ASC和SED方法,有关声学场景的信息也可能被隐式用于SED,反之亦然。
Acoustic scene classification (ASC) and sound event detection (SED) are fundamental tasks in environmental sound analysis, and many methods based on deep learning have been proposed. Considering that information on acoustic scenes and sound events helps SED and ASC mutually, some researchers have proposed a joint analysis of acoustic scenes and sound events by multitask learning (MTL). However, conventional works have not investigated in detail how acoustic scenes and sound events mutually benefit SED and ASC. We, therefore, investigate the impact of information on acoustic scenes and sound events on the performance of SED and ASC by using domain adversarial training based on a gradient reversal layer (GRL) or model training with fake labels. Experimental results obtained using the TUT Acoustic Scenes 2016/2017 and TUT Sound Events 2016/2017 show that pieces of information on acoustic scenes and sound events are effectively used to detect sound events and classify acoustic scenes, respectively. Moreover, upon comparing GRL- and fake-label-based methods with single-task-based ASC and SED methods, single-task-based methods are found to achieve better performance. This result implies that even when using single-task-based ASC and SED methods, information on acoustic scenes may be implicitly utilized for SED and vice versa.