关键字发现的公制学习

论文标题

关键字发现的公制学习

Metric Learning for Keyword Spotting

论文作者

Huh, Jaesung, Lee, Minjae, Heo, Heesoo, Mun, Seongkyu, Chung, Joon Son

论文摘要

这项工作的目的是训练有效的表示形式，以通过公制学习进行关键字。大多数现有的作品将关键字发现作为封闭设置的分类问题，其中目标和非目标关键字均已预定义。因此，基于分类器的主要关键字发现系统在训练阶段看不见的非目标声音上的表现较差，从而在现实世界中导致了高度错误的警报率。实际上，关键字发现是一个检测问题，从各种未知的声音中检测到预定义的目标关键字。这与公制学习问题有许多相似之处，因为必须与目标关键字明显区分看不见和未知的非目标声音。但是，一个关键区别在于，目标关键字是已知和预定义的。为此，我们提出了一种基于度量学习的新方法，该方法最大化目标和非目标关键字之间的距离，但也学习了目标关键字的每一类权重。分类目标。 Google语音命令数据集上的实验表明，我们的方法大大降低了错误的警报，从而看不见非目标关键字，同时保持整体分类精度。

The goal of this work is to train effective representations for keyword spotting via metric learning. Most existing works address keyword spotting as a closed-set classification problem, where both target and non-target keywords are predefined. Therefore, prevailing classifier-based keyword spotting systems perform poorly on non-target sounds which are unseen during the training stage, causing high false alarm rates in real-world scenarios. In reality, keyword spotting is a detection problem where predefined target keywords are detected from a variety of unknown sounds. This shares many similarities to metric learning problems in that the unseen and unknown non-target sounds must be clearly differentiated from the target keywords. However, a key difference is that the target keywords are known and predefined. To this end, we propose a new method based on metric learning that maximises the distance between target and non-target keywords, but also learns per-class weights for target keywords à la classification objectives. Experiments on the Google Speech Commands dataset show that our method significantly reduces false alarms to unseen non-target keywords, while maintaining the overall classification accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题