QBYE-MLPMIXER：使用mlpmixer查询by-example开放式vocabulary关键字点。

论文标题

QBYE-MLPMIXER：使用mlpmixer查询by-example开放式vocabulary关键字点。

QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer

论文作者

Huang, Jinmiao, Gharbieh, Waseem, Wan, Qianhui, Shim, Han Suk, Lee, Chul

论文摘要

当前的关键字发现系统通常通过大量预定义的关键字进行培训。在开放式摄音计量设置中识别关键字对于个性化智能设备互动至关重要。为了实现这一目标，我们提出了一个基于MLPMIXER的纯粹基于MLP的神经网络，该网络是MLPMixer - 一种MLP模型架构，可有效取代视觉变形金刚中的注意机制。我们研究了将mlpmixer架构适应QBYE开放式录音率关键字点斑点任务的不同方法。与最先进的RNN和CNN模型的比较表明，我们的方法在挑战性的情况（10dB和6DB环境中）都在公开可用的HEY-SNIPS数据集和具有400个扬声器的更大规模的内部数据集上取得了更好的性能。与基线模型相比，我们提出的模型还具有较少的参数和MAC。

Current keyword spotting systems are typically trained with a large amount of pre-defined keywords. Recognizing keywords in an open-vocabulary setting is essential for personalizing smart device interaction. Towards this goal, we propose a pure MLP-based neural network that is based on MLPMixer - an MLP model architecture that effectively replaces the attention mechanism in Vision Transformers. We investigate different ways of adapting the MLPMixer architecture to the QbyE open-vocabulary keyword spotting task. Comparisons with the state-of-the-art RNN and CNN models show that our method achieves better performance in challenging situations (10dB and 6dB environments) on both the publicly available Hey-Snips dataset and a larger scale internal dataset with 400 speakers. Our proposed model also has a smaller number of parameters and MACs compared to the baseline models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题