Distributed stochastic gradient tracking methods with momentum acceleration for non-convex optimization

被引：0

作者：

Juan Gao

Xin-Wei Liu

Yu-Hong Dai

Yakui Huang

Junhua Gu

机构：

[1] Hebei University of Technology,School of Artificial Intelligence

[2] Hebei University of Technology,Institute of Mathematics

[3] Chinese Academy of Sciences,LSEC, ICMSEC, Academy of Mathematics and Systems Science

[4] University of Chinese Academy of Sciences,School of Mathematical Sciences

来源：

Computational Optimization and Applications | 2023年 / 84卷

关键词：

Distributed non-convex optimization; Machine learning; Momentum methods; Optimization algorithms; Convergence rate;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We consider a distributed non-convex optimization problem of minimizing the sum of all local cost functions over a network of agents. This problem often appears in large-scale distributed machine learning, known as non-convex empirical risk minimization. In this paper, we propose two accelerated algorithms, named DSGT-HB and DSGT-NAG, which combine the distributed stochastic gradient tracking (DSGT) method with momentum accelerated techniques. Under appropriate assumptions, we prove that both algorithms sublinearly converge to a neighborhood of a first-order stationary point of the distributed non-convex optimization. Moreover, we derive the conditions under which DSGT-HB and DSGT-NAG achieve a network-independent linear speedup. Numerical experiments for a distributed non-convex logistic regression problem on real data sets and a deep neural network on the MNIST database show the superiorities of DSGT-HB and DSGT-NAG compared with DSGT.

引用

页码：531 / 572

页数：41

共 50 条

[31] Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization
Bianchi, Pascal
Jakubowicz, Jeremie
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (02) : 391 - 405
[32] Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization
Gersende Fort
Eric Moulines
[J]. Statistics and Computing, 2023, 33 (3)
[33] Stochastic proximal quasi-Newton methods for non-convex composite optimization
Wang, Xiaoyu
Wang, Xiao
Yuan, Ya-xiang
[J]. OPTIMIZATION METHODS & SOFTWARE, 2019, 34 (05): : 922 - 948
[34] Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization
Fort, Gersende
Moulines, Eric
[J]. STATISTICS AND COMPUTING, 2023, 33 (03)
[35] Stochastic Gradient Tracking Methods for Distributed Personalized Optimization over Networks
Huang, Yan
Xu, Jinming
Meng, Wenchao
Wai, Hoi-To
[J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4571 - 4578
[36] Stochastic Network Optimization with Non-Convex Utilities and Costs
Neely, Michael J.
[J]. 2010 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2010, : 352 - 361
[37] On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
Xu, Yi
Yuan, Zhuoning
Yang, Sen
Jin, Rong
Yang, Tianbao
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4003 - 4009
[38] Stochastic Gradient Hamiltonian Monte Carlo for non-convex learning
Chau, Huy N.
Rasonyi, Miklos
[J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2022, 149 : 341 - 368
[39] Scaling up stochastic gradient descent for non-convex optimisation
Mohamad, Saad
Alamri, Hamad
Bouchachia, Abdelhamid
[J]. MACHINE LEARNING, 2022, 111 (11) : 4039 - 4079
[40] Scaling up stochastic gradient descent for non-convex optimisation
Saad Mohamad
Hamad Alamri
Abdelhamid Bouchachia
[J]. Machine Learning, 2022, 111 : 4039 - 4079

← 1 2 3 4 5 →