论文标题
通过学习对话级特征来对话性语音识别
Conversational Speech Recognition By Learning Conversation-level Characteristics
论文作者
论文摘要
对话自动语音识别(ASR)是识别包括多个演讲者的会话语音的任务。与句子级别的ASR不同,对话式ASR自然可以从对话的特定特征(例如角色偏好和局部连贯性)中获得优势。本文提出了一个对话式ASR模型,该模型在普遍的端到端神经框架下明确学习对话级特征。提出的模型的亮点是双重的。首先,将潜在变分模块(LVM)连接到基于构象异构体的编码器decoder ASR主链上,以学习角色偏好和局部相干性。其次,专门采用了主题模型,以使解码器的输出偏向预测主题中的单词。对两个普通话对话ASR任务进行的实验表明,所提出的模型可实现最大12%的相对特征错误率(CER)降低。
Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.