论文标题
一组未知大小的简洁过滤器
Succinct Filters for Sets of Unknown Sizes
论文作者
论文摘要
会员问题要求维护设置$ S \ subseteq [u] $,支持插入和会员查询,即测试是否在集合中。计算确切答案的数据结构称为字典。当允许(小)误报率$ε$时,数据结构称为过滤器。 标准字典或过滤器的空间使用通常取决于上限的$ S $,而实际集合可能会小得多。 PAGH,SEGEV和WIEDER(FOCS'13)是第一个根据当前的$ | s | $研究了具有不同空间用法的过滤器的人。为了使该空间与当前的集合$ n = | s | $匹配,任何过滤器数据结构必须使用$(1-o(1))n(\ log(1/ε)+(1-o(ε))\ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log n)$位,与$ n \ log(1/ε)$ n $ n $ n $ in $ ins $ ins $ ins $ | s | s | |他们还提出了一个数据结构,几乎最佳空间为$(1+O(1))n(\ log(1/ε)+O(\ log \ log \ log n))$ bits,前提是$ n> u^{0.001} $,预期摊销的不变的恒定插入时间和最差的固定式固定恒定恒定恒定查找时间。 在这项工作中,我们提出了一个过滤器数据结构,具有两个方面的改进: - 对于具有很高概率的所有插入和查找,它具有恒定的最差时间; - 它使用空间$(1+o(1))n(\ log(1/ε)+\ log \ log n)$ bits $ bits当$ n> u^{0.001} $,实现所有$ε= o(1)$的最佳前导常数。 我们还提出了一个使用$(1+o(1))n \ log(u/n)$位的词典,以当前大小的最佳空间匹配最佳空间,并以恒定时间执行所有操作,并具有很高的概率。
The membership problem asks to maintain a set $S\subseteq[u]$, supporting insertions and membership queries, i.e., testing if a given element is in the set. A data structure that computes exact answers is called a dictionary. When a (small) false positive rate $ε$ is allowed, the data structure is called a filter. The space usages of the standard dictionaries or filters usually depend on the upper bound on the size of $S$, while the actual set can be much smaller. Pagh, Segev and Wieder (FOCS'13) were the first to study filters with varying space usage based on the current $|S|$. They showed in order to match the space with the current set size $n=|S|$, any filter data structure must use $(1-o(1))n(\log(1/ε)+(1-O(ε))\log\log n)$ bits, in contrast to the well-known lower bound of $N\log(1/ε)$ bits, where $N$ is an upper bound on $|S|$. They also presented a data structure with almost optimal space of $(1+o(1))n(\log(1/ε)+O(\log\log n))$ bits provided that $n>u^{0.001}$, with expected amortized constant insertion time and worst-case constant lookup time. In this work, we present a filter data structure with improvements in two aspects: - it has constant worst-case time for all insertions and lookups with high probability; - it uses space $(1+o(1))n(\log (1/ε)+\log\log n)$ bits when $n>u^{0.001}$, achieving optimal leading constant for all $ε=o(1)$. We also present a dictionary that uses $(1+o(1))n\log(u/n)$ bits of space, matching the optimal space in terms of the current size, and performs all operations in constant time with high probability.