论文标题
Echo-CGC:单跳广播网络中的一种通信效率拜占庭式分布式机器学习算法
Echo-CGC: A Communication-Efficient Byzantine-tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network
论文作者
论文摘要
在本文中,我们专注于流行的DML框架 - 参数服务器计算范式和迭代学习算法,这些算法在综合上进行。我们旨在降低单跳无线网络中拜占庭式DML算法的通信复杂性。受到Gupta和Vaidya(PODC 2020)开发的CGC过滤器的启发,我们提出了一种基于梯度下降的算法Echo-CGC。我们的主要新颖性是利用无线电网络的广播属性来避免传输原始梯度的机制(完整的$ d $维矢量)。在无线电网络中,每个工人都能偷听传输到参数服务器的先前梯度。在Echo-CGC中粗略地说,如果工人与先前的渐变结合在一起,则将广播“ Echo消息”,而不是其原始的本地梯度。回声消息包含一个系数的向量(最多$ n $)和两个梯度之间的幅度比率(浮点)。相比之下,传统方法需要在每回合中发送$ n $本地梯度,在每个回合中,每个梯度通常都是超高维空间中的向量($ d \ gg n $)。我们算法的通信复杂性的提高取决于多个因素,包括节点数量,执行中的错误工人数量以及成本函数。我们通过数值分析了改进,并表明,使用大量节点,Echo-CGC在标准假设下减少了$ 80 \%的通信。
In this paper, we focus on a popular DML framework -- the parameter server computation paradigm and iterative learning algorithms that proceed in rounds. We aim to reduce the communication complexity of Byzantine-tolerant DML algorithms in the single-hop radio network. Inspired by the CGC filter developed by Gupta and Vaidya, PODC 2020, we propose a gradient descent-based algorithm, Echo-CGC. Our main novelty is a mechanism to utilize the broadcast properties of the radio network to avoid transmitting the raw gradients (full $d$-dimensional vectors). In the radio network, each worker is able to overhear previous gradients that were transmitted to the parameter server. Roughly speaking, in Echo-CGC, if a worker "agrees" with a combination of prior gradients, it will broadcast the "echo message" instead of the its raw local gradient. The echo message contains a vector of coefficients (of size at most $n$) and the ratio of the magnitude between two gradients (a float). In comparison, the traditional approaches need to send $n$ local gradients in each round, where each gradient is typically a vector in an ultra-high dimensional space ($d\gg n$). The improvement on communication complexity of our algorithm depends on multiple factors, including number of nodes, number of faulty workers in an execution, and the cost function. We numerically analyze the improvement, and show that with a large number of nodes, Echo-CGC reduces $80\%$ of the communication under standard assumptions.