低弯曲和低失真多样性嵌入的收敛自动编码器近似

论文标题

低弯曲和低失真多样性嵌入的收敛自动编码器近似

Convergent autoencoder approximation of low bending and low distortion manifold embeddings

论文作者

Braunsmann, Juliane, Rajković, Marko, Rumpf, Martin, Wirth, Benedikt

论文摘要

由编码器和解码器组成的自动编码器被广泛用于机器学习中，以缩小高维数据的尺寸。编码器将输入数据歧管嵌入到较低的潜在空间中，而解码器表示反向映射，从而提供了潜在空间中的歧管的数据歧管的参数化。嵌入式歧管的良好规律性和结构可以基本简化进一步的数据处理任务，例如群集分析或数据插值。我们提出并分析了一种新的正规化，以学习自动编码器的编码器组件：一种损失功能，更喜欢等距，外部平坦的嵌入式，并允许自行训练编码器。为了进行训练，假定对于输入歧管上的一对附近点，可以评估其本地的riemannian距离及其当地的Riemannian平均水平。损失函数是通过蒙特卡洛集成计算的，该集成具有不同的采样策略，用于输入歧管上的一对点。我们的主要定理将嵌入映射的几何损失函数识别为$γ$ - 依赖于采样依赖的损耗功能的极限。使用编码不同明确给定的数据歧管的图像数据的数值测试表明，获得平滑的歧管嵌入到潜在空间中。由于促进了外部平坦度，这些嵌入足够规律，因此在潜在空间中线性插值可以作为一种可能的后处理，因此在歧管上不太遥远的点之间的插值很好地近似。

Autoencoders, which consist of an encoder and a decoder, are widely used in machine learning for dimension reduction of high-dimensional data. The encoder embeds the input data manifold into a lower-dimensional latent space, while the decoder represents the inverse map, providing a parametrization of the data manifold by the manifold in latent space. A good regularity and structure of the embedded manifold may substantially simplify further data processing tasks such as cluster analysis or data interpolation. We propose and analyze a novel regularization for learning the encoder component of an autoencoder: a loss functional that prefers isometric, extrinsically flat embeddings and allows to train the encoder on its own. To perform the training it is assumed that for pairs of nearby points on the input manifold their local Riemannian distance and their local Riemannian average can be evaluated. The loss functional is computed via Monte Carlo integration with different sampling strategies for pairs of points on the input manifold. Our main theorem identifies a geometric loss functional of the embedding map as the $Γ$-limit of the sampling-dependent loss functionals. Numerical tests, using image data that encodes different explicitly given data manifolds, show that smooth manifold embeddings into latent space are obtained. Due to the promotion of extrinsic flatness, these embeddings are regular enough such that interpolation between not too distant points on the manifold is well approximated by linear interpolation in latent space as one possible postprocessing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题