论文标题

非常快的关键字发现系统,实时因子低于0.01

Very Fast Keyword Spotting System with Real Time Factor below 0.01

论文作者

Nouza, Jan, Cerva, Petr, Zdansky, Jindrich

论文摘要

在论文中,我们介绍了基于现代神经网络的关键字发现(KWS)系统的体系结构,可以在各种类型的语音数据上产生良好的性能,并且可以非常快速运行。我们主要关注最后一个方面,并提出针对KWS设计所需的所有步骤的优化:信号处理和可能性计算,Viterbi解码,现场候选检测和置信度计算。我们通过双向前进的顺序记忆网络(通过标准的Triphones或所谓的准单声道或所谓的准单声道以及语音框架的完全正向解码(对回顾的最小需求),我们通过双向前进的顺序记忆网络(一种复发网的替代品)提出了时间和记忆有效建模。在3个大型捷克数据集(广播,互联网和电话,总共17小时)上评估了所提出的方案的几种变体,并通过检测错误权衡(DET)图(DET)图和实时(RT)因素比较其性能。我们证明,如果应用所有优化(包括用于可能性计算的GPU),则完整的系统可以在单个通过中以接近0.001的速度运行。

In the paper we present an architecture of a keyword spotting (KWS) system that is based on modern neural networks, yields good performance on various types of speech data and can run very fast. We focus mainly on the last aspect and propose optimizations for all the steps required in a KWS design: signal processing and likelihood computation, Viterbi decoding, spot candidate detection and confidence calculation. We present time and memory efficient modelling by bidirectional feedforward sequential memory networks (an alternative to recurrent nets) either by standard triphones or so called quasi-monophones, and an entirely forward decoding of speech frames (with minimal need for look back). Several variants of the proposed scheme are evaluated on 3 large Czech datasets (broadcast, internet and telephone, 17 hours in total) and their performance is compared by Detection Error Tradeoff (DET) diagrams and real-time (RT) factors. We demonstrate that the complete system can run in a single pass with a RT factor close to 0.001 if all optimizations (including a GPU for likelihood computation) are applied.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源