改善无监督的挥动声模型，并进行分类重新聚集

论文标题

改善无监督的挥动声模型，并进行分类重新聚集

Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

论文作者

Milde, Benjamin, Biemann, Chris

论文摘要

SparsSpeech模型是一种无监督的声学模型，可以为未转录的语音生成离散的伪标记。我们扩展了SparsSpeech模型，以允许在随机离散变量上进行采样，从而产生伪孔。在训练模型后，可以完全控制此后验中的稀疏度。我们使用Gumbel-Softmax技巧大约从神经网络中离散分布中进行示例，这使我们能够通过标准反向传播有效地训练网络。新的和改进的模型经过培训和评估，并在Libri-Light语料库上进行了评估，这是ASR的基准有限或没有监督的基准。该模型接受了600h和6000h的英语阅读语音培训。我们使用ABX误差量度和半监督设置评估了改进的模型，具有10h的抄录语音。我们观察到，在测试集上，在600h的语音数据上改进的Sparsespeech模型的ABX错误率的相对改善高达31.4％，当我们将模型扩展到6000h时，相对改进的spasspeech模型，并进一步改进。

The Sparsespeech model is an unsupervised acoustic model that can generate discrete pseudo-labels for untranscribed speech. We extend the Sparsespeech model to allow for sampling over a random discrete variable, yielding pseudo-posteriorgrams. The degree of sparsity in this posteriorgram can be fully controlled after the model has been trained. We use the Gumbel-Softmax trick to approximately sample from a discrete distribution in the neural network and this allows us to train the network efficiently with standard backpropagation. The new and improved model is trained and evaluated on the Libri-Light corpus, a benchmark for ASR with limited or no supervision. The model is trained on 600h and 6000h of English read speech. We evaluate the improved model using the ABX error measure and a semi-supervised setting with 10h of transcribed speech. We observe a relative improvement of up to 31.4% on ABX error rates across speakers on the test set with the improved Sparsespeech model on 600h of speech data and further improvements when we scale the model to 6000h.

下载PDF全文

下载文献需遵守相关版权规定

论文标题