Distributed stochastic gradient tracking methods with momentum acceleration for non-convex optimization

被引:0
|
作者
Juan Gao
Xin-Wei Liu
Yu-Hong Dai
Yakui Huang
Junhua Gu
机构
[1] Hebei University of Technology,School of Artificial Intelligence
[2] Hebei University of Technology,Institute of Mathematics
[3] Chinese Academy of Sciences,LSEC, ICMSEC, Academy of Mathematics and Systems Science
[4] University of Chinese Academy of Sciences,School of Mathematical Sciences
关键词
Distributed non-convex optimization; Machine learning; Momentum methods; Optimization algorithms; Convergence rate;
D O I
暂无
中图分类号
学科分类号
摘要
We consider a distributed non-convex optimization problem of minimizing the sum of all local cost functions over a network of agents. This problem often appears in large-scale distributed machine learning, known as non-convex empirical risk minimization. In this paper, we propose two accelerated algorithms, named DSGT-HB and DSGT-NAG, which combine the distributed stochastic gradient tracking (DSGT) method with momentum accelerated techniques. Under appropriate assumptions, we prove that both algorithms sublinearly converge to a neighborhood of a first-order stationary point of the distributed non-convex optimization. Moreover, we derive the conditions under which DSGT-HB and DSGT-NAG achieve a network-independent linear speedup. Numerical experiments for a distributed non-convex logistic regression problem on real data sets and a deep neural network on the MNIST database show the superiorities of DSGT-HB and DSGT-NAG compared with DSGT.
引用
收藏
页码:531 / 572
页数:41
相关论文
共 50 条
  • [31] Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization
    Bianchi, Pascal
    Jakubowicz, Jeremie
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (02) : 391 - 405
  • [32] Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization
    Gersende Fort
    Eric Moulines
    [J]. Statistics and Computing, 2023, 33 (3)
  • [33] Stochastic proximal quasi-Newton methods for non-convex composite optimization
    Wang, Xiaoyu
    Wang, Xiao
    Yuan, Ya-xiang
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2019, 34 (05): : 922 - 948
  • [34] Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization
    Fort, Gersende
    Moulines, Eric
    [J]. STATISTICS AND COMPUTING, 2023, 33 (03)
  • [35] Stochastic Gradient Tracking Methods for Distributed Personalized Optimization over Networks
    Huang, Yan
    Xu, Jinming
    Meng, Wenchao
    Wai, Hoi-To
    [J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4571 - 4578
  • [36] Stochastic Network Optimization with Non-Convex Utilities and Costs
    Neely, Michael J.
    [J]. 2010 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2010, : 352 - 361
  • [37] On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
    Xu, Yi
    Yuan, Zhuoning
    Yang, Sen
    Jin, Rong
    Yang, Tianbao
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4003 - 4009
  • [38] Stochastic Gradient Hamiltonian Monte Carlo for non-convex learning
    Chau, Huy N.
    Rasonyi, Miklos
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2022, 149 : 341 - 368
  • [39] Scaling up stochastic gradient descent for non-convex optimisation
    Mohamad, Saad
    Alamri, Hamad
    Bouchachia, Abdelhamid
    [J]. MACHINE LEARNING, 2022, 111 (11) : 4039 - 4079
  • [40] Scaling up stochastic gradient descent for non-convex optimisation
    Saad Mohamad
    Hamad Alamri
    Abdelhamid Bouchachia
    [J]. Machine Learning, 2022, 111 : 4039 - 4079