Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning

被引：10

作者：

Guo, Pengzhan ^{[1
]}

Ye, Zeyang ^{[1
]}

Xiao, Keli ^{[2
]}

Zhu, Wei ^{[1
]}

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] SUNY Stony Brook, Coll Business, Stony Brook, NY 11794 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2022年 / 34卷 / 10期

基金：

美国国家科学基金会;

关键词：

Optimization; Deep learning; Convergence; Stochastic processes; Mathematical model; Boltzmann distribution; Task analysis; Stochastic optimization; stochastic gradient descent; parallel computing; deep learning; neural network; ALGORITHM;

D O I：

10.1109/TKDE.2020.3047894

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the stochastic optimization problem focusing on developing scalable parallel algorithms for deep learning tasks. Our solution involves a reformation of the objective function for stochastic optimization in neural network models, along with a novel parallel computing strategy, coined the weighted aggregating stochastic gradient descent (WASGD). Following a theoretical analysis on the characteristics of the new objective function, WASGD introduces a decentralized weighted aggregating scheme based on the performance of local workers. Without any center variable, the new method automatically gauges the importance of local workers and accepts them by their contributions. Furthermore, we have developed an enhanced version of the method, WASGD+, by (1) implementing a designed sample order and (2) upgrading the weight evaluation function. To validate the new method, we benchmark our pipeline against several popular algorithms including the state-of-the-art deep neural network classifier training techniques (e.g., elastic averaging SGD). Comprehensive validation studies have been conducted on four classic datasets: CIFAR-100, CIFAR-10, Fashion-MNIST, and MNIST. Subsequent results have firmly validated the superiority of the WASGD scheme in accelerating the training of deep architecture. Better still, the enhanced version, WASGD+, is shown to be a significant improvement over its prototype.

引用

页码：5037 / 5050

页数：14

共 50 条

[1] Distributed stochastic gradient descent with discriminative aggregating
Chen, Zhen-Hong
Lan, Yan-Yan
Guo, Jia-Feng
Cheng, Xue-Qi
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (10): : 2054 - 2063
[2] Recent Advances in Stochastic Gradient Descent in Deep Learning
Tian, Yingjie
Zhang, Yuqi
Zhang, Haibin
[J]. MATHEMATICS, 2023, 11 (03)
[3] Parallel Fractional Stochastic Gradient Descent With Adaptive Learning for Recommender Systems
Elahi, Fatemeh
Fazlali, Mahmood
Malazi, Hadi Tabatabaee
Elahi, Mehdi
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (03) : 470 - 483
[4] Batched Stochastic Gradient Descent with Weighted Sampling
Needell, Deanna
Ward, Rachel
[J]. APPROXIMATION THEORY XV, 2017, 201 : 279 - 306
[5] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
Li, Bo
[J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
[6] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
Teng, Yunfei
Gao, Wenbo
Chalus, Francois
Choromanska, Anna
Goldfarb, Donald
Weller, Adrian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[7] Asynchronous Decentralized Parallel Stochastic Gradient Descent
Lian, Xiangru
Zhang, Wei
Zhang, Ce
Liu, Ji
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[8] The Impact of Synchronization in Parallel Stochastic Gradient Descent
Backstrom, Karl
Papatriantafilou, Marina
Tsigas, Philippas
[J]. DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2022, 2022, 13145 : 60 - 75
[9] Optimized convergence of stochastic gradient descent by weighted averaging
Hagedorn, Melinda
Jarre, Florian
[J]. OPTIMIZATION METHODS & SOFTWARE, 2024, 39 (04): : 699 - 724
[10] Private weighted random walk stochastic gradient descent
Ayache G.
El Rouayheb S.
[J]. IEEE Journal on Selected Areas in Information Theory, 2021, 2 (01): : 452 - 463

← 1 2 3 4 5 →