自动段神经网：手机和音调应该同步还是异步？

论文标题

自动段神经网：手机和音调应该同步还是异步？

Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

论文作者

Li, Jialu, Hasegawa-Johnson, Mark

论文摘要

手机是国际语音字母（IPA）的分段单元（IPA），用于大多数人类语言的词汇区分； IPA的上部单元的音调可能以70％的速度使用。许多先前的研究探讨了自动语音识别（ASR）电话模型的跨语性适应，但是很少有人探索了手机和音调之间同步的多语言和跨语性转移。在本文中，我们测试了四个基于连接的时间分类（CTC）的声学模型，它们在手机和音调之间施加的同步程度有所不同。对模型进行了三种语言的训练和测试，然后在第四种中对交叉训练进行了调整和测试。同步模型和异步模型均在多语言和跨语性设置中有效。同步模型在联合电话+音调层中达到较低的错误率，但是异步训练会导致音调错误率较低。

Phones, the segmental units of the International Phonetic Alphabet (IPA), are used for lexical distinctions in most human languages; Tones, the suprasegmental units of the IPA, are used in perhaps 70%. Many previous studies have explored cross-lingual adaptation of automatic speech recognition (ASR) phone models, but few have explored the multilingual and cross-lingual transfer of synchronization between phones and tones. In this paper, we test four Connectionist Temporal Classification (CTC)-based acoustic models, differing in the degree of synchrony they impose between phones and tones. Models are trained and tested multilingually in three languages, then adapted and tested cross-lingually in a fourth. Both synchronous and asynchronous models are effective in both multilingual and cross-lingual settings. Synchronous models achieve lower error rate in the joint phone+tone tier, but asynchronous training results in lower tone error rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题