小型数据集中音素表示学习的预测性编码模型的分析

论文标题

小型数据集中音素表示学习的预测性编码模型的分析

Analysis of Predictive Coding Models for Phonemic Representation Learning in Small Datasets

论文作者

Blandón, María Andrea Cruz, Räsänen, Okko

论文摘要

从人类语言获取的计算建模的角度来看，使用预测编码的神经网络模型很有趣，在这种情况下，目的是了解如何在没有任何标签的情况下从语音中学到语言单位。尽管文献中已经提出了几种有前途的预测编码学习算法，但目前尚不清楚它们对不同语言和培训数据集大小的推广程度如何。此外，尽管这种模型已证明是有效的音素特征学习者，但尚不清楚这些模型的预测损失函数最小化是否也导致最佳音素样表示。本研究研究了两种具有不同数据集大小的语言的语音歧视任务（ABX任务），调查了两个预测性编码模型，自回归预测性编码和对比预测编码的行为。我们的实验显示自回归损失与两个数据集的音素辨别得分之间存在很强的相关性。但是，令我们惊讶的是，CPC模型在一次通过训练数据后已经显示出快速融合，并且平均而言，其表示形式在两种语言上的表现都优于APC的表现。

Neural network models using predictive coding are interesting from the viewpoint of computational modelling of human language acquisition, where the objective is to understand how linguistic units could be learned from speech without any labels. Even though several promising predictive coding -based learning algorithms have been proposed in the literature, it is currently unclear how well they generalise to different languages and training dataset sizes. In addition, despite that such models have shown to be effective phonemic feature learners, it is unclear whether minimisation of the predictive loss functions of these models also leads to optimal phoneme-like representations. The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task (ABX task) for two languages with different dataset sizes. Our experiments show a strong correlation between the autoregressive loss and the phoneme discrimination scores with the two datasets. However, to our surprise, the CPC model shows rapid convergence already after one pass over the training data, and, on average, its representations outperform those of APC on both languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题