深度F量最大化，以端到端的语音理解

论文标题

深度F量最大化，以端到端的语音理解

Deep F-measure Maximization for End-to-End Speech Understanding

论文作者

Sarı, Leda, Hasegawa-Johnson, Mark

论文摘要

与许多其他机器学习数据集一样，语言理解（SLU）数据集通常会遇到标签不平衡问题。标签不平衡通常会导致学习模型在输出上复制类似的偏差，这使数据集中的少数类别不公平问题。在这项工作中，我们通过在神经网络模型培训中最大化F量而不是准确性来解决公平问题。我们建议使用标准反向传播对F-Measion提出可区分的近似值，并通过此目标训练网络。我们对两个标准公平数据集，成人和社区以及犯罪进行实验，并在ATIS数据集上的语音到无数检测以及Speech-Coco数据集上的语音到图像概念分类上进行实验。在这四个任务中，与经过跨凝性损耗函数训练的模型相比，F量最大化可提高Micro-F1分数，绝对提高最高8％。在两个多类SLU任务中，提出的方法可显着改善类覆盖范围，即具有积极召回的类别的数量。

Spoken language understanding (SLU) datasets, like many other machine learning datasets, usually suffer from the label imbalance problem. Label imbalance usually causes the learned model to replicate similar biases at the output which raises the issue of unfairness to the minority classes in the dataset. In this work, we approach the fairness problem by maximizing the F-measure instead of accuracy in neural network model training. We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation. We perform experiments on two standard fairness datasets, Adult, and Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset. In all four of these tasks, F-measure maximization results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function. In the two multi-class SLU tasks, the proposed approach significantly improves class coverage, i.e., the number of classes with positive recall.

下载PDF全文

下载文献需遵守相关版权规定

论文标题