NBSync: Parallelism of Local Computing and Global Synchronization for Fast Distributed Machine Learning in WANs

被引:1
|
作者
Zhou, Huaman [1 ]
Li, Zonghang [1 ]
Yu, Hongfang [1 ,2 ]
Luo, Long [1 ]
Sun, Gang [1 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Key Lab Opt Fiber Sensing & Commun, Minist Educ, Chengdu 610056, Sichuan, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Guangdong, Peoples R China
[3] Agile & Intelligent Comp Key Lab Sichuan Prov, Chengdu 610036, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed machine learning; federated learning; parameter server system; and distributed optimization;
D O I
10.1109/TSC.2023.3304312
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, due to privacy concerns, distributed machine learning in Wide-Area Networks (DML-WANs) attracts increasing attention and has been widely deployed to promote the widespread application of intelligence services that rely on geographically distributed data. DML-WANs is essentially performing collaboratively federated learning over a combination of servers at both edge and cloud on a large spatial scale. However, efficient model training is challenging for DML-WANs because it is blocked by the high overhead of model parameter synchronization between computing servers over WANs. The reason is that there has a sequential dependency between local model computing and global model synchronization of traditional DML-WANs training methods intrinsically producing a sequential blockage between them, e.g., FedAvg. When the computing heterogeneity and the low WAN bandwidth coexist, a long block of global model synchronization prolongs the training time and leads to low utilization of local computing. Despite many efforts on alleviating synchronization overhead with novel communication technologies and synchronization methods, they still use traditional training patterns with sequential dependency and thereby have very limited improvements, such as FedAsync and ESync. In this article, we propose NBSync, a novel training algorithm for DML-WANs, which greatly speeds up the model training by the parallelism of local computing and global synchronization. NBSync employs a well-designed pipelining scheme, which can properly relax the sequential dependency of local computing and global synchronization and process them in parallel so as to overlap their operating overhead in the time dimension. NBSync also realizes flexible, differentiated and dynamical local computing for workers to maximize the overlap ratio in dynamically heterogeneous training environments. Convergence analysis shows that the convergence rate of NBSync training process is asymptotically equal to that of SSGD, and NBSync has a better convergence efficiency. We implemented the prototype of NBSync based on a popular parameter server system, i.e., MXNET's PS-LITE library, and evaluate its performance on a DML-WANs testbed. Experimental results show that NBSync speeds up training about 1.43x-2.79x than state-of-the-art distributed training algorithms (DTAs) in DML-WANs scenarios where computing heterogeneity and low WAN bandwidth coexist.
引用
收藏
页码:4115 / 4127
页数:13
相关论文
共 50 条
  • [31] Distributed machine learning load balancing strategy in cloud computing services
    Mingwei Li
    Jilin Zhang
    Jian Wan
    Yongjian Ren
    Li Zhou
    Baofu Wu
    Rui Yang
    Jue Wang
    Wireless Networks, 2020, 26 : 5517 - 5533
  • [32] Zeno: A Straggler Diagnosis System for Distributed Computing Using Machine Learning
    Shen, Huanxing
    Li, Cong
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2018, 2018, 10876 : 144 - 162
  • [33] SNAP: A Communication Efficient Distributed Machine Learning Framework for Edge Computing
    Zhao, Yangming
    Fan, Jingyuan
    Su, Lu
    Song, Tongyu
    Wang, Sheng
    Qiao, Chunming
    2020 IEEE 40TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2020, : 584 - 594
  • [34] Distributed Machine Learning in Edge Computing: Challenges, Solutions and Future Directions
    Tu, Jingke
    Yang, Lei
    Cao, Jiannong
    ACM COMPUTING SURVEYS, 2025, 57 (05)
  • [35] Accelerating model synchronization for distributed machine learning in an optical wide area network
    Liu, Ling
    Song, Liangjun
    Chen, Xi
    Yu, Hongfang
    Sun, Gang
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2022, 14 (10) : 852 - 865
  • [36] Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning
    Li, Songze
    Avestimehr, Salman
    FOUNDATIONS AND TRENDS IN COMMUNICATIONS AND INFORMATION THEORY, 2020, 17 (01): : 1 - 148
  • [37] HierTrain: Fast Hierarchical Edge AI Learning With Hybrid Parallelism in Mobile-Edge-Cloud Computing
    Liu, Deyin
    Chen, Xu
    Zhou, Zhi
    Ling, Qing
    IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2020, 1 : 634 - 645
  • [38] Distributed Approximating Global Optimality with Local Reinforcement Learning in HetNets
    Fan, Yawen
    Li, Husheng
    GLOBECOM 2017 - 2017 IEEE GLOBAL COMMUNICATIONS CONFERENCE, 2017,
  • [39] Global consensus through local synchronization: A formal basis for partially-distributed coordination
    Jongmans, S. -S. T. Q.
    Arbab, F.
    SCIENCE OF COMPUTER PROGRAMMING, 2016, 115 : 199 - 224
  • [40] Machine learning enhanced global optimization by clustering local environments
    Meldgaard, Soren
    Hammer, Bjork
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257