DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Ding, Lisang; Jin, Kexin; Ying, Bicheng; Yuan, Kun; Yin, Wotao

计算机科学 > 机器学习

arXiv:2306.00256 (cs)

[提交于 2023年6月1日 ]

标题： DSGD-CECA：具有通信最优精确共识算法的去中心化SGD

标题： DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Authors:Lisang Ding, Kexin Jin, Bicheng Ying, Kun Yuan, Wotao Yin

摘要：去中心化随机梯度下降（SGD）是一种新兴的神经网络训练方法，它允许多个代理协同且同时地训练一个模型。而不是使用一个中央参数服务器来收集所有代理的梯度，每个代理都保存一份模型参数的副本，并与少量其他代理进行通信以交换模型更新。它们的通信由通信拓扑和gossip权重矩阵所控制，促进了模型更新的交换。最先进的方法使用动态单对等指数-2拓扑，在训练时间速度和可扩展性方面优于环形、网格、环面和超立方体拓扑。然而，这种方法需要代理数量为2的幂，这在大规模情况下是不现实的。在本文中，我们消除了这一限制，并提出了\underline{D}去中心化\underline{随机梯度下降}与\underline{C}通信最优\underline{E}精确\underline{C}协商\underline{A}算法（DSGD-CECA），该算法适用于任何数量的代理，同时仍能实现最先进的特性。特别是，DSGD-CECA每次迭代的通信开销为一个单位，并具有$\tilde{O}(n^3)$的瞬态迭代复杂度。我们的证明基于对gossip权重矩阵新发现的性质以及将它们与DSGD收敛分析结合的新方法。数值实验显示了DSGD-CECA的效率。

摘要： Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose \underline{D}ecentralized \underline{SGD} with \underline{C}ommunication-optimal \underline{E}xact \underline{C}onsensus \underline{A}lgorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an $\tilde{O}(n^3)$ transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD's convergence analysis. Numerical experiments show the efficiency of DSGD-CECA.

主题：	机器学习 (cs.LG)
引用方式：	arXiv:2306.00256 [cs.LG]
	(或者 arXiv:2306.00256v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2306.00256

提交历史

来自： Lisang Ding [查看电子邮件]
[v1] 星期四， 2023 年 6 月 1 日 00:29:52 UTC (718 KB)

计算机科学 > 机器学习

标题： DSGD-CECA：具有通信最优精确共识算法的去中心化SGD

标题： DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： DSGD-CECA：具有通信最优精确共识算法的去中心化SGD 显示英文标题

标题： DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： DSGD-CECA：具有通信最优精确共识算法的去中心化SGD