利用张量训练网络的混合模型进行口语命令识别

论文标题

利用张量训练网络的混合模型进行口语命令识别

Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command Recognition

论文作者

Qi, Jun, Tejedor, Javier

论文摘要

这项工作旨在通过考虑模型参数数量和分类精度之间的不同权衡来设计低复杂性命令识别系统（SCR）系统。更具体地说，我们利用张量训练（TT）网络的深层混合体系结构来构建端到端SRC管道。我们的命令识别系统，即CNN+（TT-DNN），由底部的卷积层组成，用于光谱特征提取和顶部的TT图层以进行命令分类。与SCR的传统端到端CNN基线相比，我们提出的CNN+（TT-DNN）模型用TT替换了完全连接的（FC）层，它可以大大减少模型参数的数量，同时保持CNN模型的基线性能。我们以随机的方式或基于训练有素的CNN+DNN初始化CNN+（TT-DNN）模型，并在Google Speech命令数据集中评估CNN+（TT-DNN）模型。我们的实验结果表明，拟议的CNN+（TT-DNN）模型的竞争精度为96.31％，模型参数比CNN模型少4倍。此外，当参数数量增加时，CNN+（TT-DNN）模型可以获得97.2％的精度。

This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. More specifically, we exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our command recognition system, namely CNN+(TT-DNN), is composed of convolutional layers at the bottom for spectral feature extraction and TT layers at the top for command classification. Compared with a traditional end-to-end CNN baseline for SCR, our proposed CNN+(TT-DNN) model replaces fully connected (FC) layers with TT ones and it can substantially reduce the number of model parameters while maintaining the baseline performance of the CNN model. We initialize the CNN+(TT-DNN) model in a randomized manner or based on a well-trained CNN+DNN, and assess the CNN+(TT-DNN) models on the Google Speech Command Dataset. Our experimental results show that the proposed CNN+(TT-DNN) model attains a competitive accuracy of 96.31% with 4 times fewer model parameters than the CNN model. Furthermore, the CNN+(TT-DNN) model can obtain a 97.2% accuracy when the number of parameters is increased.

下载PDF全文

下载文献需遵守相关版权规定

论文标题