教我如何插入无数的嵌入

论文标题

教我如何插入无数的嵌入

Teach me how to Interpolate a Myriad of Embeddings

论文作者

Venkataramanan, Shashanka, Kijak, Ewa, Amsaleg, Laurent, Avrithis, Yannis

论文摘要

混音是指基于插值的数据增强，最初是由超越经验风险最小化（ERM）的一种方式。然而，它的扩展侧重于插值的定义及其发生的空间，而增强本身的研究较少：对于$ m $的迷你批次，大多数方法在$ m $对之间插值，带有单个标量插值因子$λ$。在这项工作中，我们通过引入Multimix来朝这个方向取得进展，Multimix插入了任意数字$ n $的元组（每个长度$ m $），每个元组$λ$。在序列数据上，我们进一步扩展到所有空间位置上的密集插值和损失计算。总体而言，我们通过数量级以几乎没有成本来增加每个小批量的分组数量。通过在分类器之前的最后一层插值来可以通过插值。最后，为了解决因线性目标插值而引起的不一致之处，我们引入了一种自我介绍方法来产生和插值合成目标。我们从经验上表明，我们的贡献对四个基准的最新混合方法进行了显着改善。通过分析嵌入空间，我们观察到这些类更紧密地聚集并均匀地分布在嵌入空间上，从而解释了改善的行为。

Mixup refers to interpolation-based data augmentation, originally motivated as a way to go beyond empirical risk minimization (ERM). Yet, its extensions focus on the definition of interpolation and the space where it takes place, while the augmentation itself is less studied: For a mini-batch of size $m$, most methods interpolate between $m$ pairs with a single scalar interpolation factor $λ$. In this work, we make progress in this direction by introducing MultiMix, which interpolates an arbitrary number $n$ of tuples, each of length $m$, with one vector $λ$ per tuple. On sequence data, we further extend to dense interpolation and loss computation over all spatial positions. Overall, we increase the number of tuples per mini-batch by orders of magnitude at little additional cost. This is possible by interpolating at the very last layer before the classifier. Finally, to address inconsistencies due to linear target interpolation, we introduce a self-distillation approach to generate and interpolate synthetic targets. We empirically show that our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks. By analyzing the embedding space, we observe that the classes are more tightly clustered and uniformly spread over the embedding space, thereby explaining the improved behavior.

下载PDF全文

下载文献需遵守相关版权规定

论文标题