Aalto的端到端DNN系统用于Interspeech 2020计算副语言学挑战

论文标题

Aalto的端到端DNN系统用于Interspeech 2020计算副语言学挑战

Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge

论文作者

Grósz, Tamás, Singh, Mittul, Kadiri, Sudarsana Reddy, Kathania, Hemant, Kurimo, Mikko

论文摘要

端到端的神经网络模型（E2E）在不同的拼音比较任务上显示出显着的性能优势。先前的工作已将E2E模型的单个实例应用于任务，或用于不同任务的相同的E2E架构。但是，应用单个模型是不稳定的，或者使用相同的体系结构不足以利用特定任务的信息。在比较2020年的任务上，我们调查了应用E2E模型的集合，以进行稳健的性能并为每个任务开发特定于任务的修改。比较2020年介绍了三个子挑战：呼吸子挑战，以预测患者在讲话时佩戴的呼吸带的输出，老年人的子挑战以估计老年人的唤醒和价值水平以及面具子挑战，如果说话者戴上面具或不戴上面具，则可以分类以分类。在这些任务中的每个任务上，合奏的表现都优于单个E2E模型。在呼吸子挑战中，我们研究了多损失策略对任务绩效的影响。在老年人的子挑战中，预测价值和唤醒水平促使我们调查多任务培训并实施数据采样策略以处理类不平衡。在面具子挑战上，使用没有功能工程的E2E系统与特征设计基线具有竞争力，并与功能工程设计的基线相结合时可提供可观的增长。

End-to-end neural network models (E2E) have shown significant performance benefits on different INTERSPEECH ComParE tasks. Prior work has applied either a single instance of an E2E model for a task or the same E2E architecture for different tasks. However, applying a single model is unstable or using the same architecture under-utilizes task-specific information. On ComParE 2020 tasks, we investigate applying an ensemble of E2E models for robust performance and developing task-specific modifications for each task. ComParE 2020 introduces three sub-challenges: the breathing sub-challenge to predict the output of a respiratory belt worn by a patient while speaking, the elderly sub-challenge to estimate the elderly speaker's arousal and valence levels and the mask sub-challenge to classify if the speaker is wearing a mask or not. On each of these tasks, an ensemble outperforms the single E2E model. On the breathing sub-challenge, we study the impact of multi-loss strategies on task performance. On the elderly sub-challenge, predicting the valence and arousal levels prompts us to investigate multi-task training and implement data sampling strategies to handle class imbalance. On the mask sub-challenge, using an E2E system without feature engineering is competitive to feature-engineered baselines and provides substantial gains when combined with feature-engineered baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题