基于多语言瓶颈的主题语音档案中的无监督模式发现

论文标题

基于多语言瓶颈的主题语音档案中的无监督模式发现

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

论文作者

Sung, Man-Ling, Feng, Siyuan, Lee, Tan

论文摘要

本研究解决了从未转录的音频档案中自动发现口头关键字的问题，而无需通过自动语音识别（ASR）技术进行单词的语音转录。在语音分析的许多应用中，包括有关低资源语言以及大量的多语言和多类型数据的应用，问题具有实际意义。我们提出了一种两阶段的方法，其中包括无监督的声学建模和解码，然后在声学单位序列中进行模式挖掘。整个过程首先使用未转录的数据得出和建模一组子词级语音单元。借助受过训练的声学模型，给定的音频档案由伪转录表示，通过字符串挖掘算法可以从中发现口语关键字。对于无监督的声学建模，通过多语言语音语料库训练的深度神经网络用于生成语音细分和计算瓶颈特征，以用于段聚类。实验结果表明，所提出的系统能够从MIT OpenCourse软件的讲座记录中有效提取与主题相关的单词和短语。

The present study tackles the problem of automatically discovering spoken keywords from untranscribed audio archives without requiring word-by-word speech transcription by automatic speech recognition (ASR) technology. The problem is of practical significance in many applications of speech analytics, including those concerning low-resource languages, and large amount of multilingual and multi-genre data. We propose a two-stage approach, which comprises unsupervised acoustic modeling and decoding, followed by pattern mining in acoustic unit sequences. The whole process starts by deriving and modeling a set of subword-level speech units with untranscribed data. With the unsupervisedly trained acoustic models, a given audio archive is represented by a pseudo transcription, from which spoken keywords can be discovered by string mining algorithms. For unsupervised acoustic modeling, a deep neural network trained by multilingual speech corpora is used to generate speech segmentation and compute bottleneck features for segment clustering. Experimental results show that the proposed system is able to effectively extract topic-related words and phrases from the lecture recordings on MIT OpenCourseWare.

下载PDF全文

下载文献需遵守相关版权规定

论文标题