论文标题
高性能数据挖掘的空间填充曲线
Space-filling Curves for High-performance Data Mining
论文作者
论文摘要
从两个或更高维度的空间到保留位置的一维空间,如希尔伯特 - 曲线,peano曲线和Z级地图等空间曲线。他们有许多应用程序,例如搜索结构,计算机图形,数值模拟,加密编码,可用于制造各种算法cache-oplivious。在本文中,我们描述了希尔伯特 - 曲线的一些细节。我们根据Mealy-type的有限自动机来定义希尔伯特 - 曲线,该自动机从二维坐标空间中确定希尔伯特订单值,反之亦然。我们定义了一个无上下文的语法,以在生成的坐标/订单值对数中线性的时间内生成整个曲线,即每个坐标对或订单值的恒定时间。我们还审查了两种不同的策略,这些策略可以使曲线的产生无需通常的限制到类似方形的网格,而侧长是两个的力量。最后,我们详细介绍了一些应用程序,即矩阵乘法,Cholesky分解,Floyd-Warshall算法,K-均值聚类和相似性连接。
Space-filling curves like the Hilbert-curve, Peano-curve and Z-order map natural or real numbers from a two or higher dimensional space to a one dimensional space preserving locality. They have numerous applications like search structures, computer graphics, numerical simulation, cryptographics and can be used to make various algorithms cache-oblivious. In this paper, we describe some details of the Hilbert-curve. We define the Hilbert-curve in terms of a finite automaton of Mealy-type which determines from the two-dimensional coordinate space the Hilbert order value and vice versa in a logarithmic number of steps. And we define a context-free grammar to generate the whole curve in a time which is linear in the number of generated coordinate/order value pairs, i.e. a constant time per coordinate pair or order value. We also review two different strategies which enable the generation of curves without the usual restriction to square-like grids where the side-length is a power of two. Finally, we elaborate on a few applications, namely matrix multiplication, Cholesky decomposition, the Floyd-Warshall algorithm, k-Means clustering, and the similarity join.