矢量定位的自回归预测编码

论文标题

矢量定位的自回归预测编码

Vector-Quantized Autoregressive Predictive Coding

论文作者

Chung, Yu-An, Tang, Hao, Glass, James

论文摘要

作为一个自我监督的目标，自回旋预测编码（APC）在从大量未标记的数据中学习表现方面取得了成功，并且对许多下游任务进行了丰富的表示形式。但是，在下游任务中，低自我监督的损失与强大的表现之间的联系尚不清楚。在这项工作中，我们提出了矢量定量的自动回归预测编码（VQ-APC），该编码是一种新型模型，可产生量化表示形式，从而使我们能够明确控制表示表示中编码的信息量。通过研究一系列越来越有限的模型，我们揭示了学到的表示形式的组成部分。特别是，我们通过探测任务确认了信息的存在，同时显示了没有相互信息的信息，从而发现模型在保存语音信息的能力受到限制的情况下，揭示了模型的偏好。我们发现，语音和扬声器信息被放大以最大化自我监督的目标。作为副产品，特定模型容量的学习代码与英语手机很好。

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks. However, the connection between low self-supervised loss and strong performance in downstream tasks remains unclear. In this work, we propose Vector-Quantized Autoregressive Predictive Coding (VQ-APC), a novel model that produces quantized representations, allowing us to explicitly control the amount of information encoded in the representations. By studying a sequence of increasingly limited models, we reveal the constituents of the learned representations. In particular, we confirm the presence of information with probing tasks, while showing the absence of information with mutual information, uncovering the model's preference in preserving speech information as its capacity becomes constrained. We find that there exists a point where phonetic and speaker information are amplified to maximize a self-supervised objective. As a byproduct, the learned codes for a particular model capacity correspond well to English phones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题