论文标题

基于Recsplit的高性能构建最小的完美哈希功能

High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

论文作者

Bez, Dominik, Kurpicz, Florian, Lehmann, Hans-Peter, Sanders, Peter

论文摘要

最小的完美哈希功能(MPHF)将丝线映射到第一个| S |整数。它可以用作数据库和数据压缩的构建块。 Recsplit [Esposito等,Alenex'20]目前是最有效的实用最小完美的哈希功能。它在很大程度上依赖于以野蛮的方式尝试哈希功能。我们介绍了旋转拟合,这是一种新技术,通过大幅度减少尝试的哈希功能的数量来使搜索更有效。此外,我们通过利用并行性在位,矢量,核心和GPU的水平上利用并行性来大大改善了恢复时间的构建时间。结合起来,使用GPU在8核CPU上产生的改进可产生高达239的加速度,最高可达5438。原始的单线程Recsplit实现需要1.5小时才能为500万个对象构建MPHF,每个对象为1.56位。在GPU上,我们仅在5秒内实现相同的空间使用情况。鉴于加速度大于能源消耗的增加,我们的实施比原始实施更有效。

A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX'20] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way. We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs. In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源