论文标题

了解列表的时刻通过混沌

Understanding the Moments of Tabulation Hashing via Chaoses

论文作者

Houen, Jakob Bæk Tejs, Thorup, Mikkel

论文摘要

简单的表散列可以追溯到1970年的Zobrist,定义如下:每个键都被视为$ c $字符的某些字母$σ$,我们有$ c $完全随机的哈希函数$ h_0,\ ldots,h_ {c -1}} (x_0,\ ldots,x_ {c -1})$ hash hash hash h(x)= h_0(x_0)\ oplus \ ldots \ ldots \ oplus h_ {c - 1}(x_ {x_ {c -1})$ where $ \ oplus $是$ \ oplus $是$ \ oplus $是bitwise xor xor xor operation。 The previous results on tabulation hashing by P{\v a}tra{\c s}cu and Thorup~[J.ACM'11] and by Aamand et al.~[STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. 混沌是$ \ sum a_ {i_0,\ ldots,i_ {c - 1}} x_ {i_0} \ cdot \ ldots \ cdot x_ {i_ {c -1}} $中的随机变量。混沌是一个概率理论的良好概念,在几种情况下,当独立的随机变量是标准的高斯变量以及独立的随机变量具有对数凸尾部时,已经证明了严格的分析。我们注意到,基于哈希的简单制表总和可以将其视为不是独立的混沌之和。这促使我们使用混沌理论中的技术来分析基于哈希的简单制表哈希的总和。 在本文中,我们获得了基于哈希的所有瞬间的界限,以简单的制表哈希,这仅取决于$ c $。与以前的尝试相反,我们的方法主要是分析性的,并且不采用复杂的组合论点。对简单制表哈希的改进分析使我们能够为Dahlgaard等人引入的混合制表哈希的基于哈希总和的矩获得界限。

Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as $c$ characters from some alphabet $Σ$, we have $c$ fully random hash functions $h_0, \ldots, h_{c - 1} \colon Σ\to \{0, \ldots, 2^l - 1\}$, and a key $x = (x_0, \ldots, x_{c - 1})$ is hashed to $h(x) = h_0(x_0) \oplus \ldots \oplus h_{c - 1}(x_{c - 1})$ where $\oplus$ is the bitwise XOR operation. The previous results on tabulation hashing by P{\v a}tra{\c s}cu and Thorup~[J.ACM'11] and by Aamand et al.~[STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. Chaoses are random variables of the form $\sum a_{i_0, \ldots, i_{c - 1}} X_{i_0} \cdot \ldots \cdot X_{i_{c - 1}}$ where $X_i$ are independent random variables. Chaoses are a well-studied concept from probability theory, and tight analysis has been proven in several instances, e.g., when the independent random variables are standard Gaussian variables and when the independent random variables have logarithmically convex tails. We notice that hash-based sums of simple tabulation hashing can be seen as a sum of chaoses that are not independent. This motivates us to use techniques from the theory of chaoses to analyze hash-based sums of simple tabulation hashing. In this paper, we obtain bounds for all the moments of hash-based sums for simple tabulation hashing which are tight up to constants depending only on $c$. In contrast with the previous attempts, our approach will mostly be analytical and does not employ intricate combinatorial arguments. The improved analysis of simple tabulation hashing allows us to obtain bounds for the moments of hash-based sums for the mixed tabulation hashing introduced by Dahlgaard et al.~[FOCS'15].

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源