高性能数据挖掘的空间填充曲线

论文标题

高性能数据挖掘的空间填充曲线

Space-filling Curves for High-performance Data Mining

论文作者

Böhm, Christian

论文摘要

从两个或更高维度的空间到保留位置的一维空间，如希尔伯特 - 曲线，peano曲线和Z级地图等空间曲线。他们有许多应用程序，例如搜索结构，计算机图形，数值模拟，加密编码，可用于制造各种算法cache-oplivious。在本文中，我们描述了希尔伯特 - 曲线的一些细节。我们根据Mealy-type的有限自动机来定义希尔伯特 - 曲线，该自动机从二维坐标空间中确定希尔伯特订单值，反之亦然。我们定义了一个无上下文的语法，以在生成的坐标/订单值对数中线性的时间内生成整个曲线，即每个坐标对或订单值的恒定时间。我们还审查了两种不同的策略，这些策略可以使曲线的产生无需通常的限制到类似方形的网格，而侧长是两个的力量。最后，我们详细介绍了一些应用程序，即矩阵乘法，Cholesky分解，Floyd-Warshall算法，K-均值聚类和相似性连接。

Space-filling curves like the Hilbert-curve, Peano-curve and Z-order map natural or real numbers from a two or higher dimensional space to a one dimensional space preserving locality. They have numerous applications like search structures, computer graphics, numerical simulation, cryptographics and can be used to make various algorithms cache-oblivious. In this paper, we describe some details of the Hilbert-curve. We define the Hilbert-curve in terms of a finite automaton of Mealy-type which determines from the two-dimensional coordinate space the Hilbert order value and vice versa in a logarithmic number of steps. And we define a context-free grammar to generate the whole curve in a time which is linear in the number of generated coordinate/order value pairs, i.e. a constant time per coordinate pair or order value. We also review two different strategies which enable the generation of curves without the usual restriction to square-like grids where the side-length is a power of two. Finally, we elaborate on a few applications, namely matrix multiplication, Cholesky decomposition, the Floyd-Warshall algorithm, k-Means clustering, and the similarity join.

下载PDF全文

下载文献需遵守相关版权规定

论文标题