使用注意融合的多模式自动语音评分

论文标题

使用注意融合的多模式自动语音评分

Multi-modal Automated Speech Scoring using Attention Fusion

论文作者

Grover, Manraj Singh, Kumar, Yaman, Sarin, Sumit, Vafaee, Payman, Hama, Mika, Shah, Rajiv Ratn

论文摘要

在这项研究中，我们提出了一种新型的多模式端到端神经方法，用于使用注意融合对非母语英语的自发语音进行自动评估。该管道分别采用双向复发性卷积神经网络和双向长期记忆神经网络，分别从光谱图和转录中编码声学和词汇提示。注意融合是在这些学到的预测特征上进行的，以在最终评分之前学习不同方式之间的复杂相互作用。我们将模型与强大的基线进行比较，并发现对词汇和声学提示的综合关注可以显着提高系统的整体性能。此外，我们对模型进行了定性和定量分析。

In this study, we propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech using attention fusion. The pipeline employs Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions, respectively. Attention fusion is performed on these learned predictive features to learn complex interactions between different modalities before final scoring. We compare our model with strong baselines and find combined attention to both lexical and acoustic cues significantly improves the overall performance of the system. Further, we present a qualitative and quantitative analysis of our model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题