Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

被引：1

作者：

Cong, Guojing ^{[1
]}

Liu, Tianyi ^{[2
]}

机构：

[1] IBM TJ Watson Res Ctr, Ossining, NY 10562 USA

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020) | 2020年

关键词：

D O I：

10.1109/MLHPCAI4S51975.2020.00011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.

引用

页码：29 / 39

页数：11

共 50 条

[31] Distributed stochastic power control in ad hoc networks: a nonconvex optimization case
Lei Yang
Yalin E Sagduyu
Junshan Zhang
Jason H Li
EURASIP Journal on Wireless Communications and Networking, 2012
[32] Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent
Pu, Shi
Olshevsky, Alex
Paschalidis, Ioannis Ch.
IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 114 - 122
[33] NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization
Hajinezhad, Davood
Hong, Mingyi
Zhao, Tuo
Wang, Zhaoran
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[34] Distributed stochastic power control in ad hoc networks: a nonconvex optimization case
Yang, Lei
Sagduyu, Yalin E.
Zhang, Junshan
Li, Jason H.
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2012,
[35] Distributed Event-Triggered Stochastic Gradient-Tracking for Nonconvex Optimization
Ishikawa, Daichi
Hayashi, Naoki
Takai, Shigemasa
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2024, E107A (05) : 762 - 769
[36] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
Xu, Yangyang
Xu, Yibo
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 196 (01) : 266 - 297
[37] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
Yangyang Xu
Yibo Xu
Journal of Optimization Theory and Applications, 2023, 196 : 266 - 297
[38] Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
Wei, Fuchao
Bao, Chenglong
Liu, Yang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[39] Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
Loizou, Nicolas
Richtarik, Peter
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2020, 77 (03) : 653 - 710
[40] Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
Nicolas Loizou
Peter Richtárik
Computational Optimization and Applications, 2020, 77 : 653 - 710

← 1 2 3 4 5 →