论文标题
关于歧管假设:超出表面的亚法嵌入使用示波器
On Manifold Hypothesis: Hypersurface Submanifold Embedding Using Osculating Hyperspheres
论文作者
论文摘要
考虑Euclidean Space $ \ Mathbb {R}^D $中的一组$ n $数据点。该集合在机器学习和数据科学中称为数据集。歧管假设指出,数据集位于具有高概率的低维基符号上。所有维度降低和流形学习方法都有歧管假设的假设。在本文中,我们表明数据集位于嵌入式Hypersurface Submanifold上,该submanifold是本地$(d-1)$ - 尺寸。因此,我们表明,歧管假设至少适用于嵌入维度$ d-1 $。使用金字塔结构中的诱导,我们还将嵌入尺寸扩展到降低嵌入尺寸,以显示歧管假设的有效性,用于嵌入尺寸$ \ {1,2,\ dots,d-1 \} $。为了嵌入Hypersurface,我们首先构建了数据$ D $最近的邻居图。对于每一点,我们使用其邻居使用该超晶体示意为假设的超出表面的邻居,适合示意性的Hypersphere $ s^{d-1} $。然后,使用手术理论,我们将手术应用于示意性的超球体,以获得$ n $ bypercaps。我们使用局部超圆柱互相将超盖彼此连接起来。通过连接所有零件,嵌入的超脸是这些元素的不相交联合。我们讨论了嵌入式超表面的几何特征,例如具有边界,其拓扑,平滑度,界限,定向性,紧凑性和注射性。还为数据的线性和结构提供了一些讨论。本文是几个科学领域的交集,包括机器学习,差异几何和代数拓扑。
Consider a set of $n$ data points in the Euclidean space $\mathbb{R}^d$. This set is called dataset in machine learning and data science. Manifold hypothesis states that the dataset lies on a low-dimensional submanifold with high probability. All dimensionality reduction and manifold learning methods have the assumption of manifold hypothesis. In this paper, we show that the dataset lies on an embedded hypersurface submanifold which is locally $(d-1)$-dimensional. Hence, we show that the manifold hypothesis holds at least for the embedding dimensionality $d-1$. Using an induction in a pyramid structure, we also extend the embedding dimensionality to lower embedding dimensionalities to show the validity of manifold hypothesis for embedding dimensionalities $\{1, 2, \dots, d-1\}$. For embedding the hypersurface, we first construct the $d$ nearest neighbors graph for data. For every point, we fit an osculating hypersphere $S^{d-1}$ using its neighbors where this hypersphere is osculating to a hypothetical hypersurface. Then, using surgery theory, we apply surgery on the osculating hyperspheres to obtain $n$ hyper-caps. We connect the hyper-caps to one another using partial hyper-cylinders. By connecting all parts, the embedded hypersurface is obtained as the disjoint union of these elements. We discuss the geometrical characteristics of the embedded hypersurface, such as having boundary, its topology, smoothness, boundedness, orientability, compactness, and injectivity. Some discussion are also provided for the linearity and structure of data. This paper is the intersection of several fields of science including machine learning, differential geometry, and algebraic topology.