第五代当地培训方法可以支持客户抽样吗？是的！

论文标题

第五代当地培训方法可以支持客户抽样吗？是的！

Can 5th Generation Local Training Methods Support Client Sampling? Yes!

论文作者

Grudzień, Michał, Malinovsky, Grigory, Richtárik, Peter

论文摘要

McMahan等人著名的FedAvg算法。（2017年）基于三个组成部分：客户端采样（CS），数据采样（DS）和本地培训（LT）。虽然前两个知识良好，但第三部分的作用是减少训练模型所需的沟通次数的数量，因此抵制了所有尝试令人满意的理论解释。 Malinovsky等。（2022）根据提供的理论沟通复杂性保证的质量确定了四个不同的LT方法。尽管在这一领域取得了很多进展，但没有一个现有的作品能够表明，在理论上，采用多个本地梯度型步骤（即参与LT）比依靠仅在重要的异质数据制度中依靠单个本地梯度型步骤要好。 Mishchenko等人在其Proxskip方法及其理论分析中的最新突破中。（2022）表明，LT确实导致了任意异质数据的可证明的通信加速度，从而启动了$ 5^{\ rm th} $生成LT方法。但是，尽管这些最新一代LT方法与DS兼容，但它们都不支持CS。我们在肯定中解决了这个空旷的问题。为此，我们必须基于新算法和理论基础的算法开发。

The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT). While the first two are reasonably well understood, the third component, whose role is to reduce the number of communication rounds needed to train the model, resisted all attempts at a satisfactory theoretical explanation. Malinovsky et al. (2022) identified four distinct generations of LT methods based on the quality of the provided theoretical communication complexity guarantees. Despite a lot of progress in this area, none of the existing works were able to show that it is theoretically better to employ multiple local gradient-type steps (i.e., to engage in LT) than to rely on a single local gradient-type step only in the important heterogeneous data regime. In a recent breakthrough embodied in their ProxSkip method and its theoretical analysis, Mishchenko et al. (2022) showed that LT indeed leads to provable communication acceleration for arbitrarily heterogeneous data, thus jump-starting the $5^{\rm th}$ generation of LT methods. However, while these latest generation LT methods are compatible with DS, none of them support CS. We resolve this open problem in the affirmative. In order to do so, we had to base our algorithmic development on new algorithmic and theoretical foundations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题