中国科学技术大学学报 ›› 2020, Vol. 50 ›› Issue (8): 1156-1161.DOI: 10.3969/j.issn.0253-2778.2020.08.016

• 论著 • 上一篇    下一篇

分组随机梯度下降法:掉队和延迟的平衡

高翔,陈力   

  1. 中国科学技术大学电子工程与信息科学系,安徽合肥 230027
  • 收稿日期:2020-07-01 修回日期:2020-07-28 接受日期:2020-07-28 出版日期:2020-08-31 发布日期:2020-07-28
  • 通讯作者: 陈力
  • 作者简介:高翔,男,1995年生,硕士生. 研究方向:分布式机器学习. E-mail: xgao0@mail.ustc.edu.cn

Group stochastic gradient descent: A tradeoff between straggler and staleness

GAO Xiang, CHEN Li   

  1. Department of Electronic Engineering and Information Science , University of Science and Technology of China , HeFei 230027, China
  • Received:2020-07-01 Revised:2020-07-28 Accepted:2020-07-28 Online:2020-08-31 Published:2020-07-28

摘要: 分布式随机梯度下降法被广泛应用于大规模机器学习,同步随机梯度下降法和异步随机梯度下降法

关键词: 随机梯度下降, 分布式机器学习, 掉队者, 延迟

Abstract: Distributed stochastic gradient descent(DSGD)is widely used for large scale distributed machine learning. Two typical implementations of DSGD are synchronous SGD(SSGD)and asynchronous SGD(ASGD). In SSGD, all workers should wait for each other and the training speed will be slowed down to that of the straggler. In ASGD, the stale gradients can result in a poorly trained model. To solve this problem, a new version of distributed SGD method based named group SGD(GSGD)is proposed, which puts workers with similar computation and communication performance in a group and divides them into several groups. The workers in the same group work in a synchronous manner while different groups work in an asynchronous manner. The proposed method can migrate the straggler problem since workers in the same group spend little time waiting for each other. The staleness of the method is small since the number of groups is much smaller than the number of workers. The convergence of the method is proved through theoretical analysis. Simulation results show that the method converges faster than SSGD and ASGD in the heterogeneous cluster.

Key words: Stochastic gradient descent, distributed machine learning, straggler, staleness

中图分类号: