论文标题

测量空间均匀性,并用超丝和弦长度分布

Measuring spatial uniformity with the hypersphere chord length distribution

论文作者

Sidiropoulos, Panagiotis

论文摘要

数据统一性是与几个语义数据特征有关的概念,例如缺乏特征,相关性和样本偏差。本文介绍了一种新的措施,以评估数据均匀性并检测高维欧几里德空间上的均匀点。空间均匀度测度基于超球形和弦与L2范围的数据欧几里得距离之间的同构,这暗示了,在欧几里得空间中,L2拟态的数据可以在高超球杆上的几何定义为几何定义。使用均匀选择点的距离分布与超球长长度分布之间的施加联系来量化均匀性。更具体地说,在检查了该分布的一些定性和定量特征之前,重新审视了Hypersphere和弦长度分布的闭合形式表达,这些特征可以与数据均匀性直接链接。实验部分包括在四个不同的设置中进行验证,从而证实了对实际数据科学应用程序的新均匀度度量的潜力。

Data uniformity is a concept associated with several semantic data characteristics such as lack of features, correlation and sample bias. This article introduces a novel measure to assess data uniformity and detect uniform pointsets on high-dimensional Euclidean spaces. Spatial uniformity measure builds upon the isomorphism between hyperspherical chords and L2-normalised data Euclidean distances, which is implied by the fact that, in Euclidean spaces, L2-normalised data can be geometrically defined as points on a hypersphere. The imposed connection between the distance distribution of uniformly selected points and the hyperspherical chord length distribution is employed to quantify uniformity. More specifically,, the closed-form expression of hypersphere chord length distribution is revisited extended, before examining a few qualitative and quantitative characteristics of this distribution that can be rather straightforwardly linked to data uniformity. The experimental section includes validation in four distinct setups, thus substantiating the potential of the new uniformity measure on practical data-science applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源