论文标题

编码数据重新平衡:基本限制和构造

Coded Data Rebalancing: Fundamental Limits and Constructions

论文作者

Krishnan, Prasad, Lalitha, V., Natarajan, Lakshmi

论文摘要

分布式数据库通常在存储节点之间遭受数据分布不平等,这被称为“数据倾斜”。数据倾斜来自许多原因,例如删除现有存储节点以及在数据库中添加新的空节点。数据倾斜会导致性能降解,\ textColor {black} {so s s s s s s s s s so}需要定期``重新平衡''以减少偏斜量。我们将$ r $平衡的分布式数据库定义为一个分布式数据库,在该数据库中,跨节点的存储具有统一的大小,并且每个数据的数据都在$ r $不同的存储节点中复制。我们考虑设计这样的平衡数据库以及相关的重新平衡方案的问题,这些方案在节点删除和加法操作下维护$ r $平衡的属性。我们提供了具有结构性不变性属性的$ r $平衡数据库(由存储节点的参数化),即,为不同数量的存储节点设计的数据库具有相同的基本结构。对于此类$ r $平衡的数据库,我们提出了使用存储节点之间编码的传输的重新平衡方案,并在节点添加和删除下表征其通信负载。我们表明,为重新平衡我们的分布式数据库而产生的沟通成本是最佳的,即,在所有可能的平衡分布式数据库和重新平衡方案中,它实现了最低可能的成本。

Distributed databases often suffer unequal distribution of data among storage nodes, which is known as `data skew'. Data skew arises from a number of causes such as removal of existing storage nodes and addition of new empty nodes to the database. Data skew leads to performance degradations and \textcolor{black}{thus} necessitates `rebalancing' at regular intervals to reduce the amount of skew. We define an $r$-balanced distributed database as a distributed database in which the storage across the nodes has uniform size, and each bit of the data is replicated in $r$ distinct storage nodes. We consider the problem of designing such balanced databases along with associated rebalancing schemes which maintain the $r$-balanced property under node removal and addition operations. We present a class of $r$-balanced databases (parameterized by the number of storage nodes) which have the property of structural invariance, i.e., the databases designed for different number of storage nodes have the same essential structure. For this class of $r$-balanced databases, we present rebalancing schemes which use coded transmissions between storage nodes, and characterize their communication loads under node addition and removal. We show that the communication cost incurred to rebalance our distributed database for node addition and removal is optimal, i.e., it achieves the minimum possible cost among all possible balanced distributed databases and rebalancing schemes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源