基于关系思维的语音识别的深度图随机过程

论文标题

基于关系思维的语音识别的深度图随机过程

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

论文作者

Huang, Hengguan, Xue, Fuzhao, Wang, Hao, Wang, Ye

论文摘要

关系思维的核心是最初依靠与新的感觉信号与先验知识之间的关系有关的无意识感知的特征，因此，通过对这些感知的耦合和转变，成为一个可识别的概念或对象。这种心理过程很难在现实世界中的问题中建模，例如在对话自动语音识别（ASR）中，因为感知（如果它们被建模为表明话语之间的关系的图表）应该是无数的，并且不直接可观察到。在本文中，我们提出了一种称为“深图随机过程（DGP）”的贝叶斯非参数深度学习方法，该方法可以生成代表知觉的无限概率图。我们进一步提供了用于这些感知图的封闭式解决方案，以进行声学建模。我们的方法能够成功推断出话语之间的关系，而无需在培训期间使用任何关系数据。对包括Chime-2和Chime-5在内的ASR任务的实验评估证明了我们方法的有效性和好处。

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题