NBSync: Parallelism of Local Computing and Global Synchronization for Fast Distributed Machine Learning in WANs

被引:1
|
作者
Zhou, Huaman [1 ]
Li, Zonghang [1 ]
Yu, Hongfang [1 ,2 ]
Luo, Long [1 ]
Sun, Gang [1 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Key Lab Opt Fiber Sensing & Commun, Minist Educ, Chengdu 610056, Sichuan, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Guangdong, Peoples R China
[3] Agile & Intelligent Comp Key Lab Sichuan Prov, Chengdu 610036, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed machine learning; federated learning; parameter server system; and distributed optimization;
D O I
10.1109/TSC.2023.3304312
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, due to privacy concerns, distributed machine learning in Wide-Area Networks (DML-WANs) attracts increasing attention and has been widely deployed to promote the widespread application of intelligence services that rely on geographically distributed data. DML-WANs is essentially performing collaboratively federated learning over a combination of servers at both edge and cloud on a large spatial scale. However, efficient model training is challenging for DML-WANs because it is blocked by the high overhead of model parameter synchronization between computing servers over WANs. The reason is that there has a sequential dependency between local model computing and global model synchronization of traditional DML-WANs training methods intrinsically producing a sequential blockage between them, e.g., FedAvg. When the computing heterogeneity and the low WAN bandwidth coexist, a long block of global model synchronization prolongs the training time and leads to low utilization of local computing. Despite many efforts on alleviating synchronization overhead with novel communication technologies and synchronization methods, they still use traditional training patterns with sequential dependency and thereby have very limited improvements, such as FedAsync and ESync. In this article, we propose NBSync, a novel training algorithm for DML-WANs, which greatly speeds up the model training by the parallelism of local computing and global synchronization. NBSync employs a well-designed pipelining scheme, which can properly relax the sequential dependency of local computing and global synchronization and process them in parallel so as to overlap their operating overhead in the time dimension. NBSync also realizes flexible, differentiated and dynamical local computing for workers to maximize the overlap ratio in dynamically heterogeneous training environments. Convergence analysis shows that the convergence rate of NBSync training process is asymptotically equal to that of SSGD, and NBSync has a better convergence efficiency. We implemented the prototype of NBSync based on a popular parameter server system, i.e., MXNET's PS-LITE library, and evaluate its performance on a DML-WANs testbed. Experimental results show that NBSync speeds up training about 1.43x-2.79x than state-of-the-art distributed training algorithms (DTAs) in DML-WANs scenarios where computing heterogeneity and low WAN bandwidth coexist.
引用
收藏
页码:4115 / 4127
页数:13
相关论文
共 50 条
  • [1] Stay Fresh: Speculative Synchronization for Fast Distributed Machine Learning
    Zhang, Chengliang
    Tian, Huangshi
    Wang, Wei
    Yan, Feng
    2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 99 - 109
  • [2] TSEngine: Enable Efficient Communication Overlay in Distributed Machine Learning in WANs
    Zhou, Huaman
    Cai, Weibo
    Li, Zonghang
    Yu, Hongfang
    Liu, Ling
    Luo, Long
    Sun, Gang
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (04): : 4846 - 4859
  • [3] Analysis of Global and Local Synchronization in Parallel Computing
    Cicirelli, Franco
    Giordano, Andrea
    Mastroianni, Carlo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (05) : 988 - 1000
  • [4] HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning
    Li, Yijun
    Huang, Jiawei
    Li, Zhaoyi
    Zhou, Shengwen
    Jiang, Wanchun
    Wang, Jianxin
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [5] Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning
    Tang, Tingting
    Ali, Ramy E.
    Hashemi, Hanieh
    Gangwani, Tynan
    Avestimehr, Salman
    Annavaram, Murali
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 628 - 638
  • [6] Edge Computing Solutions for Distributed Machine Learning
    Marozzo, Fabrizio
    Orsino, Alessio
    Talia, Domenico
    Trunfio, Paolo
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 1148 - 1155
  • [7] An Edge Computing Marketplace for Distributed Machine Learning
    Yerabolu, Susham
    Gomena, Samuel
    Aryafar, Ehsan
    Joe-Wong, Carlee
    PROCEEDINGS OF THE 2019 ACM SIGCOMM CONFERENCE POSTERS AND DEMOS (SIGCOMM '19), 2019, : 36 - 38
  • [8] Fast Parameter Synchronization for Distributed Learning with Selective Multicast
    Luo, Shouxi
    Fan, Pingzhi
    Li, Ke
    Xing, Huanlai
    Luo, Long
    Yu, Hongfang
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4775 - 4780
  • [9] dSyncPS: Delayed Synchronization for Dynamic Deployment of Distributed Machine Learning
    Guo, Yibo
    Wang, An
    PROCEEDINGS OF THE 2022 2ND EUROPEAN WORKSHOP ON MACHINE LEARNING AND SYSTEMS (EUROMLSYS '22), 2022, : 79 - 86
  • [10] DOSP: an optimal synchronization of parameter server for distributed machine learning
    Meiguang Zheng
    Dongbang Mao
    Liu Yang
    Yeming Wei
    Zhigang Hu
    The Journal of Supercomputing, 2022, 78 : 13865 - 13892