基于端到端神经变压器的口语理解

论文标题

基于端到端神经变压器的口语理解

End-to-End Neural Transformer Based Spoken Language Understanding

论文作者

Radfar, Martin, Mouchtaris, Athanasios, Kunzmann, Siegfried

论文摘要

口语理解（SLU）是指从音频信号中推断语义信息的过程。尽管神经变压器始终在自然语言处理领域（NLP）中最先进的神经体系结构中提供最佳性能，但它们在紧密相关的领域的优点，即口语理解（SLU）。在本文中，我们引入了基于端到端神经变压器的SLU模型，该模型可以预测嵌入没有中间令牌预测架构的音频信号中的可变长度域，意图和插槽向量。这种新的体系结构利用了自我发挥的机制，通过该机制，音频信号被转换为各种子空格，从而可以提取话语所隐含的语义上下文。我们的端到端变压器SLU可以预测流利语音命令中的域，意图和插槽，其精度等于98.1 \％，99.6 \％和99.6 \％，并且比我们的尺寸构成了25 \ yous shotter的slu模型，这些模型的slu模型相结合了。此外，由于自我发项层中的独立子空间投影，该模型具有高度可行的，这使其成为eve依slu的良好候选者。

Spoken language understanding (SLU) refers to the process of inferring the semantic information from audio signals. While the neural transformers consistently deliver the best performance among the state-of-the-art neural architectures in field of natural language processing (NLP), their merits in a closely related field, i.e., spoken language understanding (SLU) have not beed investigated. In this paper, we introduce an end-to-end neural transformer-based SLU model that can predict the variable-length domain, intent, and slots vectors embedded in an audio signal with no intermediate token prediction architecture. This new architecture leverages the self-attention mechanism by which the audio signal is transformed to various sub-subspaces allowing to extract the semantic context implied by an utterance. Our end-to-end transformer SLU predicts the domains, intents and slots in the Fluent Speech Commands dataset with accuracy equal to 98.1 \%, 99.6 \%, and 99.6 \%, respectively and outperforms the SLU models that leverage a combination of recurrent and convolutional neural networks by 1.4 \% while the size of our model is 25\% smaller than that of these architectures. Additionally, due to independent sub-space projections in the self-attention layer, the model is highly parallelizable which makes it a good candidate for on-device SLU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题