Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

被引:0
|
作者
Pascal Bianchi
Walid Hachem
Sholom Schechtman
机构
[1] Telecom Paris,LTCI
[2] Université Gustave Eiffel,LIGM, CNRS
来源
关键词
Clarke subdifferential; Backpropagation algorithm; Differential inclusions; Non convex and non smooth optimization; Stochastic approximation; 34A60; 65K05; 65K10; 90C15;
D O I
暂无
中图分类号
学科分类号
摘要
This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function, defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; another choice is the output of the celebrated backpropagation algorithm, which is popular amongst practioners, and whose properties have recently been studied by Bolte and Pauwels. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential of the mean function, it has been assumed in the literature that an oracle of the Clarke subdifferential of the mean function is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization points of the algorithm. Next, in the small step size regime, it is shown that the interpolated trajectory of the algorithm converges in probability (in the compact convergence sense) towards the set of solutions of a particular differential inclusion: the subgradient flow. Finally, viewing the iterates as a Markov chain whose transition kernel is indexed by the step size, it is shown that the invariant distribution of the kernel converge weakly to the set of invariant distribution of this differential inclusion as the step size tends to zero. These results show that when the step size is small, with large probability, the iterates eventually lie in a neighborhood of the critical points of the mean function.
引用
收藏
页码:1117 / 1147
页数:30
相关论文
共 50 条
  • [1] Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
    Bianchi, Pascal
    Hachem, Walid
    Schechtman, Sholom
    [J]. SET-VALUED AND VARIATIONAL ANALYSIS, 2022, 30 (03) : 1117 - 1147
  • [2] Convergence rates for the stochastic gradient descent method for non-convex objective functions
    Fehrman, Benjamin
    Gess, Benjamin
    Jentzen, Arnulf
    [J]. Journal of Machine Learning Research, 2020, 21
  • [3] Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions
    Fehrman, Benjamin
    Gess, Benjamin
    Jentzen, Arnulf
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [4] Almost sure convergence of stochastic composite objective mirror descent for non-convex non-smooth optimization
    Liang, Yuqing
    Xu, Dongpo
    Zhang, Naimin
    Mandic, Danilo P.
    [J]. OPTIMIZATION LETTERS, 2023,
  • [5] Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
    Tan, Yan Shuo
    Vershynin, Roman
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [6] Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence
    Xu, Yi
    Qi, Qi
    Lin, Qihang
    Jin, Rong
    Yang, Tianbao
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
    Xu, Yi
    Yuan, Zhuoning
    Yang, Sen
    Jin, Rong
    Yang, Tianbao
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4003 - 4009
  • [8] Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization
    Metel, Michael R.
    Takeda, Akiko
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [9] On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
    Mertikopoulos, Panayotis
    Hallak, Nadav
    Kavis, Ali
    Cevher, Volkan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Fast Proximal Gradient Descent for A Class of Non-convex and Non-smooth Sparse Learning Problems
    Yang, Yingzhen
    Yu, Jiahui
    [J]. 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 1253 - 1262