Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima

被引:0
|
作者
Swenson, Brian [1 ]
Murray, Ryan [2 ]
Poor, H. Vincent [3 ]
Kar, Soummya [4 ]
机构
[1] Penn State Univ, Appl Res Lab, State Coll, PA 16801 USA
[2] North Carolina State Univ, Dept Math, Raleigh, NC 27695 USA
[3] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
[4] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Nonconvex optimization; distributed optimization; stochastic optimization; saddle point; gradient descent; OPTIMIZATION; ALGORITHM; ADAPTATION; CONVEX; NETWORKS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gradient-descent (GD) based algorithms are an indispensable tool for optimizing modern machine learning models. The paper considers distributed stochastic GD (D-SGD)-a network-based variant of GD. Distributed algorithms play an important role in large-scale machine learning problems as well as the Internet of Things (IoT) and related applications. The paper considers two main issues. First, we study convergence of D-SGD to critical points when the loss function is nonconvex and nonsmooth. We consider a broad range of nonsmooth loss functions including those of practical interest in modern deep learning. It is shown that, for each fixed initialization, D-SGD converges to critical points of the loss with probability one. Next, we consider the problem of avoiding saddle points. It is well known that classical GD avoids saddle points; however, analogous results have been absent for distributed variants of GD. For this problem, we again assume that loss functions may be nonconvex and nonsmooth, but are smooth in a neighborhood of a saddle point. It is shown that, for any fixed initialization, D-SGD avoids such saddle points with probability one. Results are proved by studying the underlying (distributed) gradient flow, using the ordinary differential equation (ODE) method of stochastic approximation.
引用
收藏
页数:62
相关论文
共 50 条
  • [1] Distributed Gradient Flow: Nonsmoothness, Nonconvexity, and Saddle Point Evasion
    Swenson, Brian
    Murray, Ryan
    Poor, H. Vincent
    Kar, Soummya
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (08) : 3949 - 3964
  • [2] Convergence analysis of distributed stochastic gradient descent with shuffling
    Meng, Qi
    Chen, Wei
    Wang, Yue
    Ma, Zhi-Ming
    Liu, Tie-Yan
    [J]. NEUROCOMPUTING, 2019, 337 : 46 - 57
  • [3] On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum
    Gao, Hongchang
    Li, Junyi
    Huang, Heng
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms
    Lu, Kaihong
    Wang, Hongxia
    Zhang, Huanshui
    Wang, Long
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (04) : 2189 - 2204
  • [5] Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
    Mitra, Partha P.
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1890 - 1894
  • [6] LOCAL MINIMA ESCAPE TRANSIENTS BY STOCHASTIC GRADIENT DESCENT ALGORITHMS IN BLIND ADAPTIVE EQUALIZERS
    FRATER, MR
    BITMEAD, RR
    JOHNSON, CR
    [J]. AUTOMATICA, 1995, 31 (04) : 637 - 641
  • [7] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [8] Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency
    Deng, Yuyang
    Mandavi, Mehrdad
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [9] Local Stochastic Factored Gradient Descent for Distributed Quantum State Tomography
    Kim, Junhyung Lyle
    Toghani, Mohammad Taha
    Uribe, Cesar A.
    Kyrillidis, Anastasios
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2022, 7 : 199 - 204
  • [10] Bayesian Distributed Stochastic Gradient Descent
    Teng, Michael
    Wood, Frank
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31