论文标题
与计数杜鹃滤波器的多层同步
Multiset Synchronization with Counting Cuckoo Filters
论文作者
论文摘要
集合同步是分布式应用程序和实现的基本任务。同步简单集的现有方法主要基于紧凑的数据结构,例如Bloom Filter及其变体。但是,这些方法是不可行的,可以同步一对允许元素多次出现的多组。为此,在本文中,我们建议利用杜鹃滤光圈(CCF)(CCF)(杜鹃滤波器的一种新型变体)来表示,然后同步一对多组。杜鹃滤波器(CF)是一个最小化的哈希表,它使用杜鹃哈希(Cuckoo Hashing)解决碰撞。 CF有一系列的存储桶,每个存储桶都有多个插槽可以存储元素指纹。基于CF,CCF将每个插槽扩展为两个字段,指纹字段和计数器场。指纹字段记录了该插槽存储的元素的指纹;虽然计数器字段计算存储元素的多样性。通过这样的设计,CCF有能力代表任何多键。在生成和交换代表本地多组合的相应CCF之后,我们提出了基于查询的基于查询的方法和基于解码的方法,以识别给定的多组分之间的不同元素。全面的评估结果表明,当使用同步准确性和空间效率的方面,CCF的表现优于计数Bloom滤波器(CBF),以较高的时间消耗为代价。
Set synchronization is a fundamental task in distributed applications and implementations. Existing methods that synchronize simple sets are mainly based on compact data structures such as Bloom filter and its variants. However, these methods are infeasible to synchronize a pair of multisets which allow an element to appear for multiple times. To this end, in this paper, we propose to leverage the counting cuckoo filter (CCF), a novel variant of cuckoo filter, to represent and thereafter synchronize a pair of multisets. The cuckoo filter (CF) is a minimized hash table that uses cuckoo hashing to resolve collisions. CF has an array of buckets, each of which has multiple slots to store element fingerprints. Based on CF, CCF extends each slot as two fields, the fingerprint field and the counter field. The fingerprint field records the fingerprint of element which is stored by this slot; while the counter field counts the multiplicity of the stored element. With such a design, CCF is competent to represent any multiset. After generating and exchanging the respective CCFs which represent the local multi-sets, we propose the query-based and the decoding-based methods to identify the different elements between the given multisets. The comprehensive evaluation results indicate that CCF outperforms the counting Bloom filter (CBF) when they are used to synchronize multisets, in terms of both synchronization accuracy and the space-efficiency, at the cost of a little higher time-consumption.