论文标题
声学事件标记对多任务学习框架中场景分类的影响
Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework
论文作者
论文摘要
声学事件是具有定义明确的光谱特征的声音,可以与生成它们的物理对象相关联。声学场景是没有特定时间顺序的此类声学事件的集合。鉴于事件和场景之间的这种自然联系,一个普遍的信念是,对事件进行分类的能力必须有助于对场景的分类。这导致了一些努力,试图使用多任务网络在声学事件标记(AET)和声学场景分类(ASC)上做得很好。但是,在这些努力中,一项任务的改进不能保证另一个任务的改善,这表明ASC和AET之间存在张力。目前尚不清楚AET的改进是否转化为ASC的改进。我们通过一项广泛的实证研究探讨了这一难题,并表明在某些条件下,使用AET作为多任务网络中的辅助任务始终提高ASC的性能。此外,ASC性能进一步改善了AET数据集大小,并且对事件的选择或AET数据集中的事件数量不敏感。我们得出的结论是,ASC性能的这种改善来自使用AET的正规化效果,而不是网络提高了识别声学事件之间的能力。
Acoustic events are sounds with well-defined spectro-temporal characteristics which can be associated with the physical objects generating them. Acoustic scenes are collections of such acoustic events in no specific temporal order. Given this natural linkage between events and scenes, a common belief is that the ability to classify events must help in the classification of scenes. This has led to several efforts attempting to do well on Acoustic Event Tagging (AET) and Acoustic Scene Classification (ASC) using a multi-task network. However, in these efforts, improvement in one task does not guarantee an improvement in the other, suggesting a tension between ASC and AET. It is unclear if improvements in AET translates to improvements in ASC. We explore this conundrum through an extensive empirical study and show that under certain conditions, using AET as an auxiliary task in the multi-task network consistently improves ASC performance. Additionally, ASC performance further improves with the AET data-set size and is not sensitive to the choice of events or the number of events in the AET data-set. We conclude that this improvement in ASC performance comes from the regularization effect of using AET and not from the network's improved ability to discern between acoustic events.