知识转移，以进行有效的设备虚假触发触发器缓解

论文标题

知识转移，以进行有效的设备虚假触发触发器缓解

Knowledge Transfer for Efficient On-device False Trigger Mitigation

论文作者

Dighe, Pranay, Marchi, Erik, Vishnubhotla, Srikanth, Kajarekar, Sachin, Naik, Devang

论文摘要

在本文中，我们解决了确定给定的话语是否针对语音智能辅助设备的任务。无向的话语称为“错误触发”，错误触发器缓解（FTM）对于设计以隐私为中心的非侵入性智能助手至关重要。可以通过在其上运行自动语音识别（ASR）并通过分析ASR转录本来确定用户意图来确定话语的定向性。但是，如果有错误的触发因素，使用ASR本身转录音频是不受欢迎的。为了减轻此问题，我们提出了一个基于LSTM的FTM架构，该体系结构可以直接从声学功能中确定用户意图，而不会明确地从音频中生成ASR转录本。所提出的模型是小的占地面积，可以在计算资源有限的情况下运行设备。在培训期间，使用知识转移方法对模型参数进行了优化，在该方法中，更准确的自我发言图神经网络模型是教师。鉴于整个音频片段，我们的方法以99％的真实正率（TPR）减轻87％的假触发器，在流音频场景中，该系统仅在获得同一TPR的同时拒绝它，然后拒绝它的1.69秒。

In this paper, we address the task of determining whether a given utterance is directed towards a voice-enabled smart-assistant device or not. An undirected utterance is termed as a "false trigger" and false trigger mitigation (FTM) is essential for designing a privacy-centric non-intrusive smart assistant. The directedness of an utterance can be identified by running automatic speech recognition (ASR) on it and determining the user intent by analyzing the ASR transcript. But in case of a false trigger, transcribing the audio using ASR itself is strongly undesirable. To alleviate this issue, we propose an LSTM-based FTM architecture which determines the user intent from acoustic features directly without explicitly generating ASR transcripts from the audio. The proposed models are small footprint and can be run on-device with limited computational resources. During training, the model parameters are optimized using a knowledge transfer approach where a more accurate self-attention graph neural network model serves as the teacher. Given the whole audio snippets, our approach mitigates 87% of false triggers at 99% true positive rate (TPR), and in a streaming audio scenario, the system listens to only 1.69s of the false trigger audio before rejecting it while achieving the same TPR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题