基于多任务学习的声学场景和声音事件的联合分析具有动态重量适应

论文标题

基于多任务学习的声学场景和声音事件的联合分析具有动态重量适应

Joint Analysis of Acoustic Scenes and Sound Events Based on Multitask Learning with Dynamic Weight Adaptation

论文作者

Nada, Kayo, Imoto, Keisuke, Tsuchiya, Takao

论文摘要

声学场景分类（ASC）和声音事件检测（SED）是环境声音分析的主要主题。考虑到声学场景和声音事件彼此紧密相关，因此在某些以前的作品中提出了使用多任务学习（MTL）学习（MTL）的神经网络对声学场景和声音事件的联合分析。常规方法使用具有恒定权重的ASC和SED损耗函数的线性组合训练基于MTL的模型。但是，常规基于MTL的方法的性能在很大程度上取决于ASC和SED损失的权重，并且很难确定ASC和SED和SED损耗的恒定权重之间的适当平衡。在本文中，我们根据动态权重平均值和多焦点损失提出了ASC和SED的动态权重适应方法，以自动调整学习权重。进行了2016/2017和TUT声音事件2016/2017的评估实验，与常规MTL方法相比，我们表明所提出的方法改善了场景分类和事件检测性能特征。然后，我们研究ASC和SED任务的学习权重随着模型培训的进行而动态适应。

Acoustic scene classification (ASC) and sound event detection (SED) are major topics in environmental sound analysis. Considering that acoustic scenes and sound events are closely related to each other, the joint analysis of acoustic scenes and sound events using multitask learning (MTL)-based neural networks was proposed in some previous works. Conventional methods train MTL-based models using a linear combination of ASC and SED loss functions with constant weights. However, the performance of conventional MTL-based methods depends strongly on the weights of the ASC and SED losses, and it is difficult to determine the appropriate balance between the constant weights of the losses of MTL of ASC and SED. In this paper, we thus propose dynamic weight adaptation methods for MTL of ASC and SED based on dynamic weight average and multi--focal loss to adjust the learning weights automatically. Evaluation experiments using parts of the TUT Acoustic Scenes 2016/2017 and TUT Sound Events 2016/2017 are conducted, and we show that the proposed methods improve the scene classification and event detection performance characteristics compared with the conventional MTL-based method. We then investigate how the learning weights of ASC and SED tasks dynamically adapt as the model training progresses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题