Communication-Censored Distributed Stochastic Gradient Descent

被引:11
|
作者
Li, Weiyu [1 ,2 ]
Wu, Zhaoxian [1 ,3 ,4 ]
Chen, Tianyi [5 ]
Li, Liping [6 ]
Ling, Qing [1 ,3 ,4 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Univ Sci & Technol China, Sch Gifted Young, Hefei 230026, Peoples R China
[3] Sun Yat Sen Univ, Guangdong Prov Key Lab Computat Sci, Guangzhou 510006, Peoples R China
[4] Pazhou Lab, Guangzhou 510300, Peoples R China
[5] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
[6] Univ Sci & Technol China, Dept Automat, Hefei 230027, Peoples R China
关键词
Servers; Convergence; Optimization; Stochastic processes; Machine learning algorithms; Sun; Signal processing algorithms; Communication censoring; communication efficiency; distributed optimization; stochastic gradient descent (SGD); ALGORITHMS;
D O I
10.1109/TNNLS.2021.3083655
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications, such as distributed machine learning. Different from the existing works based on quantization and sparsification, we introduce a communication-censoring technique to reduce the transmissions of variables, which leads to our communication-censored distributed stochastic gradient descent (CSGD) algorithm. Specifically, in CSGD, the latest minibatch stochastic gradient at a worker will be transmitted to the server if and only if it is sufficiently informative. When the latest gradient is not available, the stale one will be reused at the server. To implement this communication-censoring strategy, the batch size is increasing in order to alleviate the effect of stochastic gradient noise. Theoretically, CSGD enjoys the same order of convergence rate as that of SGD but effectively reduces communication. Numerical experiments demonstrate the sizable communication saving of CSGD.
引用
收藏
页码:6831 / 6843
页数:13
相关论文
共 50 条
  • [41] A parallel and distributed stochastic gradient descent implementation using commodity clusters
    Kennedy, Robert K. L.
    Khoshgoftaar, Taghi M.
    Villanustre, Flavio
    Humphrey, Timothy
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [42] Local Stochastic Factored Gradient Descent for Distributed Quantum State Tomography
    Kim, Junhyung Lyle
    Toghani, Mohammad Taha
    Uribe, Cesar A.
    Kyrillidis, Anastasios
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2022, 7 : 199 - 204
  • [43] Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent
    Pu, Shi
    Olshevsky, Alex
    Paschalidis, Ioannis Ch.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 114 - 122
  • [44] Communication-efficient Variance-reduced Stochastic Gradient Descent
    Ghadikolaei, Hossein S.
    Magnusson, Sindri
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 2648 - 2653
  • [45] Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency
    Deng, Yuyang
    Mandavi, Mehrdad
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [46] EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent
    Ghosh, Soumyadip
    Gupta, Vijay
    [J]. 2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020), 2020, : 1 - 8
  • [47] Communication-Efficient Stochastic Gradient Descent Ascent with Momentum Algorithms
    Zhang, Yihan
    Qiu, Meikang
    Gao, Hongchang
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4602 - 4610
  • [48] Distributed Online Adaptive Gradient Descent With Event-Triggered Communication
    Okamoto, Koki
    Hayashi, Naoki
    Takai, Shigemasa
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (02): : 610 - 622
  • [49] Unforgeability in Stochastic Gradient Descent
    Baluta, Teodora
    Nikolic, Ivica
    Jain, Racchit
    Aggarwal, Divesh
    Saxena, Prateek
    [J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
  • [50] Preconditioned Stochastic Gradient Descent
    Li, Xi-Lin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466