Journal of University of Science and Technology of China ›› 2020, Vol. 50 ›› Issue (8): 1156-1161.DOI: 10.3969/j.issn.0253-2778.2020.08.016

• Original Paper • Previous Articles     Next Articles

Group stochastic gradient descent: A tradeoff between straggler and staleness

GAO Xiang, CHEN Li   

  1. Department of Electronic Engineering and Information Science , University of Science and Technology of China , HeFei 230027, China
  • Received:2020-07-01 Revised:2020-07-28 Accepted:2020-07-28 Online:2020-08-31 Published:2020-07-28

Abstract: Distributed stochastic gradient descent(DSGD)is widely used for large scale distributed machine learning. Two typical implementations of DSGD are synchronous SGD(SSGD)and asynchronous SGD(ASGD). In SSGD, all workers should wait for each other and the training speed will be slowed down to that of the straggler. In ASGD, the stale gradients can result in a poorly trained model. To solve this problem, a new version of distributed SGD method based named group SGD(GSGD)is proposed, which puts workers with similar computation and communication performance in a group and divides them into several groups. The workers in the same group work in a synchronous manner while different groups work in an asynchronous manner. The proposed method can migrate the straggler problem since workers in the same group spend little time waiting for each other. The staleness of the method is small since the number of groups is much smaller than the number of workers. The convergence of the method is proved through theoretical analysis. Simulation results show that the method converges faster than SSGD and ASGD in the heterogeneous cluster.

Key words: Stochastic gradient descent, distributed machine learning, straggler, staleness

CLC Number: