使用可学习的分段特征的音素边界检测

论文标题

使用可学习的分段特征的音素边界检测

Phoneme Boundary Detection using Learnable Segmental Features

论文作者

Kreuk, Felix, Sheena, Yaniv, Keshet, Joseph, Adi, Yossi

论文摘要

音素边界检测是各种语音处理应用的重要第一步，例如说话者诊断，语音科学，关键字斑点等。在这项工作中，我们提出了一个神经体系结构，并与参数化的结构化损失函数相结合，以学习语音边界检测任务的段段表示。首先，当未给出输入的语音时，我们评估了我们的模型。圆锥形和七叶树的结果表明，所提出的模型优于基线模型，并且在F1和R-Value方面达到了最先进的性能。我们进一步探讨了语音转录作为附加监督的使用，并表明这会在性能方面取得较小的改善，但收敛速度会大大提高。我们还在希伯来语语料库上评估该模型，并在多语言环境中证明这种语音监督可能是有益的。

Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc. In this work, we propose a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection. First, we evaluated our model when the spoken phonemes were not given as input. Results on the TIMIT and Buckeye corpora suggest that the proposed model is superior to the baseline models and reaches state-of-the-art performance in terms of F1 and R-value. We further explore the use of phonetic transcription as additional supervision and show this yields minor improvements in performance but substantially better convergence rates. We additionally evaluate the model on a Hebrew corpus and demonstrate such phonetic supervision can be beneficial in a multi-lingual setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题