通过梯度增强未经训练的神经网络的分布式学习和低沟通成本

论文标题

通过梯度增强未经训练的神经网络的分布式学习和低沟通成本

Distributed Learning with Low Communication Cost via Gradient Boosting Untrained Neural Network

论文作者

Zhang, Xiatian, He, Xunshi, Wang, Nan, Chen, Rong

论文摘要

对于高维数据，分布式GBDT的通信成本很大，因为GBDT的通信量与功能数量有关。为了克服这个问题，我们提出了一种新颖的梯度增强算法，这是增强未经训练的神经网络（GBUN）的梯度。 Gbun组合了未经训练的随机生成的神经网络，该神经网络将数据样本软分配给多个神经元输出，并大大降低了分布式学习的通信成本。为了避免为高维数据创建庞大的神经网络，我们将Simhash算法扩展到模仿神经网络的正向计算。我们在多个公共数据集上的实验表明，就预测准确性而言，GBUN与常规GBDT一样好，并且比在分布式学习的规模属性中要好得多。与常规的GBDT品种相比，GBUN用64台机器在群集上最多可加速训练过程，并在群集上使用100kb/s网络带宽加速训练过程。因此，Gbun不仅是一种有效的分布式学习算法，而且在联合学习方面具有巨大的潜力。

For high-dimensional data, there are huge communication costs for distributed GBDT because the communication volume of GBDT is related to the number of features. To overcome this problem, we propose a novel gradient boosting algorithm, the Gradient Boosting Untrained Neural Network(GBUN). GBUN ensembles the untrained randomly generated neural network that softly distributes data samples to multiple neuron outputs and dramatically reduces the communication costs for distributed learning. To avoid creating huge neural networks for high-dimensional data, we extend Simhash algorithm to mimic forward calculation of the neural network. Our experiments on multiple public datasets show that GBUN is as good as conventional GBDT in terms of prediction accuracy and much better than it in scaling property for distributed learning. Comparing to conventional GBDT varieties, GBUN speeds up the training process up to 13 times on the cluster with 64 machines, and up to 4614 times on the cluster with 100KB/s network bandwidth. Therefore, GBUN is not only an efficient distributed learning algorithm but also has great potentials for federated learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题