与部分恢复的编码分布式计算

论文标题

与部分恢复的编码分布式计算

Coded Distributed Computing with Partial Recovery

论文作者

Ozfatura, Emre, Ulukus, Sennur, Gunduz, Deniz

论文摘要

编码计算技术为分布式计算中的散布工人提供了鲁棒性。但是，大多数现有方案都需要精确提供散乱行为，而忽略了散乱的工人进行的计算。此外，这些方案通常旨在准确恢复所需的计算结果，而在许多机器学习和迭代优化算法中，众所周知，更快的近似解决方案可以改善整体收敛时间。在本文中，我们首先引入了一种新颖的编码矩阵矢量乘法方案，称为“与部分恢复”（CCPR）的编码计算方案，该方案受益于编码和未编码的计算方案的优势，从而使计算时间和计算时间既可以通过精确的计算速度和计算速度来降低解码复杂性。然后，我们通过提出具有部分恢复的编码通信方案来扩展这种方法，以分布更通用的计算任务，其中在交流之前对工人计算的子任务的结果进行了编码。大型线性回归任务上的数值模拟证实了所提出的分布式计算方案的好处，并在计算精度和延迟之间的权衡方面进行了部分回收。

Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behaviour and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR), which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed distributed computation scheme with partial recovery in terms of the trade-off between the computation accuracy and latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题