论文标题
分析不变数据的索引结构
Analysis of Indexing Structures for Immutable Data
论文作者
论文摘要
在诸如区块链和协作数据分析之类的新兴应用程序中,对数据不可分割性,多元访问访问和篡改的控件有强烈的需求。这将导致三个新的索引结构用于不可变数据,即Merkle Patricia Trie(MPT),Merkle Bucket Tree(MBT)和面向模式的切片树(POS-Tree)。尽管这些结构已在实际应用中采用,但在文献中没有系统地评估其利弊。这使得从业者很难为其应用选择正确的索引结构,因为对每个索引的特征只有有限的理解。 为了减轻上述缺陷,我们对现有数据的现有指数结构进行了全面分析,以评估其渐近性和经验性能。具体而言,我们表明,MPT,MBT和POS-Tree都是最近提出的框架的实例,称为\ my {结构不变且可重复使用的索引(Siri)}。我们建议根据五个基本指标评估SIRI实例:它们的四个索引操作效率(即查找,更新,比较和合并),以及它们的\ my \ my \ my {refleplication atibiation}(即,索引的大小且索引的大小在没有尺寸的情况下进行了缩减)。我们根据这五个指标建立了每个索引的最坏情况保证,并在实验中评估了各种环境中的所有索引。基于我们的理论和经验分析,我们得出结论,POS-Tree是索引不变数据的有利选择。
In emerging applications such as blockchains and collaborative data analytics, there are strong demands for data immutability, multi-version accesses, and tamper-evident controls. This leads to three new index structures for immutable data, namely Merkle Patricia Trie (MPT), Merkle Bucket Tree (MBT), and Pattern-Oriented-Split Tree (POS-Tree). Although these structures have been adopted in real applications, there is no systematic evaluation of their pros and cons in the literature. This makes it difficult for practitioners to choose the right index structure for their applications, as there is only a limited understanding of the characteristics of each index. To alleviate the above deficiency, we present a comprehensive analysis of the existing index structures for immutable data, evaluating both their asymptotic and empirical performance. Specifically, we show that MPT, MBT, and POS-Tree are all instances of a recently proposed framework, dubbed \my{Structurally Invariant and Reusable Indexes (SIRI)}. We propose to evaluate the SIRI instances based on five essential metrics: their efficiency for four index operations (i.e., lookup, update, comparison, and merge), as well as their \my{deduplication ratios} (i.e., the size of the index with deduplication over the size without deduplication). We establish the worst-case guarantees of each index in terms of these five metrics, and we experimentally evaluate all indexes in a large variety of settings. Based on our theoretical and empirical analysis, we conclude that POS-Tree is a favorable choice for indexing immutable data.