分散分布数据库的编码数据重新平衡

论文标题

分散分布数据库的编码数据重新平衡

Coded Data Rebalancing for Decentralized Distributed Databases

论文作者

Sree, K V Sushena, Krishnan, Prasad

论文摘要

由于跨存储节点（也称为\ textit {data Skew}）的非均匀存储而影响基于复制的分布式数据库的性能以及在操作过程中复制因子的减少，尤其是由于节点添加或删除。数据重新平衡是指节点之间涉及的通信在纠正此数据偏斜的同时保持复制因子。对于精心设计的分布式数据库，最近已显示在重新平衡阶段传输编码的符号可以减少重新平衡的沟通负载。在这项工作中，我们使用\ textit {随机放置}来查看平衡的分布式数据库，其中每个数据段存储在系统中$ r $ nodes的随机子集中，其中$ r $表示分布式数据库的复制因子。我们将其称为分散数据库。对于这种分散数据库的天然类别，我们提出了重新平衡方案，以纠正数据倾斜度以及由于单个节点添加或去除而引起的复制因子的减少。我们提供相反的论点，表明我们提出的重新平衡方案在文件大小上是最佳的。

The performance of replication-based distributed databases is affected due to non-uniform storage across storage nodes (also called \textit{data skew}) and reduction in the replication factor during operation, particularly due to node additions or removals. Data rebalancing refers to the communication involved between the nodes in correcting this data skew, while maintaining the replication factor. For carefully designed distributed databases, transmitting coded symbols during the rebalancing phase has been recently shown to reduce the communication load of rebalancing. In this work, we look at balanced distributed databases with \textit{random placement}, in which each data segment is stored in a random subset of $r$ nodes in the system, where $r$ refers to the replication factor of the distributed database. We call these as decentralized databases. For a natural class of such decentralized databases, we propose rebalancing schemes for correcting data skew and the reduction in the replication factor arising due to a single node addition or removal. We give converse arguments which show that our proposed rebalancing schemes are optimal asymptotically in the size of the file.

下载PDF全文

下载文献需遵守相关版权规定

论文标题