Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning

被引:10
|
作者
Guo, Pengzhan [1 ]
Ye, Zeyang [1 ]
Xiao, Keli [2 ]
Zhu, Wei [1 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] SUNY Stony Brook, Coll Business, Stony Brook, NY 11794 USA
基金
美国国家科学基金会;
关键词
Optimization; Deep learning; Convergence; Stochastic processes; Mathematical model; Boltzmann distribution; Task analysis; Stochastic optimization; stochastic gradient descent; parallel computing; deep learning; neural network; ALGORITHM;
D O I
10.1109/TKDE.2020.3047894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the stochastic optimization problem focusing on developing scalable parallel algorithms for deep learning tasks. Our solution involves a reformation of the objective function for stochastic optimization in neural network models, along with a novel parallel computing strategy, coined the weighted aggregating stochastic gradient descent (WASGD). Following a theoretical analysis on the characteristics of the new objective function, WASGD introduces a decentralized weighted aggregating scheme based on the performance of local workers. Without any center variable, the new method automatically gauges the importance of local workers and accepts them by their contributions. Furthermore, we have developed an enhanced version of the method, WASGD+, by (1) implementing a designed sample order and (2) upgrading the weight evaluation function. To validate the new method, we benchmark our pipeline against several popular algorithms including the state-of-the-art deep neural network classifier training techniques (e.g., elastic averaging SGD). Comprehensive validation studies have been conducted on four classic datasets: CIFAR-100, CIFAR-10, Fashion-MNIST, and MNIST. Subsequent results have firmly validated the superiority of the WASGD scheme in accelerating the training of deep architecture. Better still, the enhanced version, WASGD+, is shown to be a significant improvement over its prototype.
引用
收藏
页码:5037 / 5050
页数:14
相关论文
共 50 条
  • [1] Distributed stochastic gradient descent with discriminative aggregating
    Chen, Zhen-Hong
    Lan, Yan-Yan
    Guo, Jia-Feng
    Cheng, Xue-Qi
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (10): : 2054 - 2063
  • [2] Recent Advances in Stochastic Gradient Descent in Deep Learning
    Tian, Yingjie
    Zhang, Yuqi
    Zhang, Haibin
    [J]. MATHEMATICS, 2023, 11 (03)
  • [3] Parallel Fractional Stochastic Gradient Descent With Adaptive Learning for Recommender Systems
    Elahi, Fatemeh
    Fazlali, Mahmood
    Malazi, Hadi Tabatabaee
    Elahi, Mehdi
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (03) : 470 - 483
  • [4] Batched Stochastic Gradient Descent with Weighted Sampling
    Needell, Deanna
    Ward, Rachel
    [J]. APPROXIMATION THEORY XV, 2017, 201 : 279 - 306
  • [5] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    Li, Bo
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
  • [6] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
    Teng, Yunfei
    Gao, Wenbo
    Chalus, Francois
    Choromanska, Anna
    Goldfarb, Donald
    Weller, Adrian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Asynchronous Decentralized Parallel Stochastic Gradient Descent
    Lian, Xiangru
    Zhang, Wei
    Zhang, Ce
    Liu, Ji
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [8] The Impact of Synchronization in Parallel Stochastic Gradient Descent
    Backstrom, Karl
    Papatriantafilou, Marina
    Tsigas, Philippas
    [J]. DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2022, 2022, 13145 : 60 - 75
  • [9] Optimized convergence of stochastic gradient descent by weighted averaging
    Hagedorn, Melinda
    Jarre, Florian
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2024, 39 (04): : 699 - 724
  • [10] Private weighted random walk stochastic gradient descent
    Ayache G.
    El Rouayheb S.
    [J]. IEEE Journal on Selected Areas in Information Theory, 2021, 2 (01): : 452 - 463