论文标题

在快速变化的网络中,用于擦除编码簇的多级转发和调度恢复算法

Multi-level Forwarding and Scheduling Recovery Algorithm in Rapidly-changing Network for Erasure-coded Clusters

论文作者

Zhou, Hai, Feng, Dan, Hu, Yuchong

论文摘要

擦除编码簇的关键设计目标是减少维修时间。现有的擦除数据修复方案大致分为两类:1。在均匀环境中设计快速数据修复(例如PPR)。 2。基于在异质环境中的带宽构建数据修复(例如PPT)。但是,这些解决方案很难应对擦除编码簇中的异质和快速变化的网络。为了解决这个问题,提出了一种带宽感知的多级转发修复算法,称为BMFrepair。当数据转发数据时,BMFrepair实时监视网络带宽,并选择具有高带宽链接的空闲节点以帮助转发。因此,它可以减少链路传输较低的时间瓶颈。同时,当带宽急剧变化时,多节点修复变得非常复杂。提出了一种称为MSREPAIR的多节点调度修复算法,用于多节点修复问题,该问题可以通过调度节点资源并行修复多个失败的块。这两种算法可以灵活地适应快速变化的网络环境,并充分利用空闲节点的带宽资源。最重要的是,算法可以根据快速和动态网络的带宽变化不断调整维修计划。已经通过在Aliyun Cloud Platform EC上进行的模拟和真实实验来评估该算法。结果表明,与最先进的维修方案PPR和PPT相比,算法可以显着减少快速变化的网络中的维修时间。

A key design goal of erasure-coded clusters is to reduce the repair time. The existing Erasure-coded data repair schemes are roughly classified into two categories: 1. Designing rapid data repair (e.g., PPR) in a homogeneous environment. 2. Constructing data repair (e.g., PPT) based on bandwidth in a heterogeneous environment. However, these solutions are difficult to cope with the heterogeneous and Rapidly-changing network in erasure-coded clusters. To address this problem, a bandwidth-aware multi-level forwarding repair algorithm, called BMFRepair, is proposed. BMFRepair monitors the network bandwidth in real time when data is forwarded, and selects idle nodes with high-bandwidth links to assist in forwarding. Thus, it can reduce the time bottleneck caused by low link transmission. At the same time, multi-node repair becomes very complicated when the bandwidth changes drastically. A multi-node scheduling repairing algorithm, called MSRepair, is proposed for multi-node repairing problems, which can repair multiple failed blocks in parallel by scheduling node resources. The two algorithms can flexibly adapt to the rapidly changing network environment and make full use of the bandwidth resources of idle nodes. Most importantly, algorithms can continuously adjust the repair plan according to the bandwidth change in fast and dynamic network. The algorithms have been evaluated by both simulations on Mininet and real experiments on Aliyun cloud platform ECS. Results show that compared with the state-of-the-art repair schemes PPR and PPT, the algorithms can significantly reduce the repair time in rapidly-changing network.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源