论文标题

自动化的总订单广播

Self-stabilizing Total-order Broadcast

论文作者

Lundström, Oskar, Raynal, Michel, Schiller, Elad Michael

论文摘要

总订单(统一可靠)广播的问题是易于故障分布式计算的基础,因为它抽象了一组广泛的问题,这些问题需要流程以与发送的相同顺序统一传递消息。现有的解决方案(可以容忍过程失败)将总订单广播问题减少到多价共识之一。 我们的研究旨在设计一个更可靠的解决方案。我们通过自动化的镜头这样做,这是一个非常强烈的容错概念。除了节点和通信故障外,自稳定算法在发生任意瞬态故障后还可以恢复;这些故障代表对系统设计为操作的任何假设的任何违反(只要算法代码保持完好无损)。 这项工作提出了第一个(据我们所知)(据我们所知)的第一个自动化算法,用于总阶(统一可靠)广播,用于异步消息传播系统,容易处理故障和瞬态故障。如我们所示,提出的解决方案促进了使用有界内存的自动化状态机器复制的优雅结构。

The problem of total-order (uniform reliable) broadcast is fundamental in fault-tolerant distributed computing since it abstracts a broad set of problems requiring processes to uniformly deliver messages in the same order in which they were sent. Existing solutions (that tolerate process failures) reduce the total-order broadcast problem to the one of multivalued consensus. Our study aims at the design of an even more reliable solution. We do so through the lenses of self-stabilization-a very strong notion of fault tolerance. In addition to node and communication failures, self-stabilizing algorithms can recover after the occurrence of arbitrary transient faults; these faults represent any violation of the assumptions according to which the system was designed to operate (as long as the algorithm code stays intact). This work proposes the first (to the best of our knowledge) self-stabilizing algorithm for total-order (uniform reliable) broadcast for asynchronous message-passing systems prone to process failures and transient faults. As we show, the proposed solution facilitates the elegant construction of self-stabilizing state-machine replication using bounded memory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源