Fast Convergence of Random Reshuffling Under Over-Parameterization and the Polyak-Lojasiewicz Condition

被引:1
|
作者
Fan, Chen [1 ]
Thrampoulidis, Christos [2 ]
Schmidt, Mark [1 ,3 ]
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada
[2] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC, Canada
[3] Canada CIFAR AI Chair Amii, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
OPTIMIZATION;
D O I
10.1007/978-3-031-43421-1_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient descent (SGD) known as random reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the beginning of each epoch and each iteration chooses the next sample from the permutation. For under-parameterized models, it has been shown RR can converge faster than SGD under certain assumptions. However, previous works do not show that RR outperforms SGD in over-parameterized settings except in some highly-restrictive scenarios. For the class of Polyak-Lojasiewicz (PL) functions, we show that RR can outperform SGD in over-parameterized settings when either one of the following holds: (i) the number of samples (n) is less than the product of the condition number (kappa) and the parameter (alpha) of a weak growth condition (WGC), or (ii) n is less than the parameter (rho) of a strong growth condition (SGC).
引用
收藏
页码:301 / 315
页数:15
相关论文
共 41 条
  • [1] Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Lojasiewicz Condition
    Fan, Chen
    Thrampoulidis, Christos
    Schmidt, Mark
    arXiv, 2023,
  • [2] OVER-PARAMETERIZED MODEL OPTIMIZATION WITH POLYAK-LOJASIEWICZ CONDITION
    Chen, Yixuan
    Shi, Yubin
    Dong, Mingzhi
    Yang, Xiaochen
    Li, Dongsheng
    Wang, Yujiang
    Dick, Robert P.
    Lv, Qin
    Zhao, Yingying
    Yang, Fan
    Gu, Ning
    Shang, Li
    11th International Conference on Learning Representations, ICLR 2023, 2023,
  • [3] Asynchronous Parallel Nonconvex Optimization Under the Polyak-Lojasiewicz Condition
    Yazdani, Kasra
    Hale, Matthew
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 524 - 529
  • [4] A Generalized Alternating Method for Bilevel Optimization under the Polyak-Lojasiewicz Condition
    Rensselaer Polytechnic Institute, Troy
    NY, United States
    不详
    NY, United States
    arXiv, 1600,
  • [5] A Generalized Alternating Method for Bilevel Optimization under the Polyak-Lojasiewicz Condition
    Xiao, Quan
    Lu, Songtao
    Chen, Tianyi
    Advances in Neural Information Processing Systems, 2023, 36
  • [6] Distributed Event-Triggered Nonconvex Optimization under Polyak-Lojasiewicz Condition
    Gao, Chao
    Xu, Lei
    Zhang, Kunpeng
    Li, Yuzhe
    Liu, Zhiwei
    Yang, Tao
    2024 18TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, ICARCV, 2024, : 930 - 935
  • [7] Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality
    Fakultät für Mathematik, Universität Bielefeld, Universitätsstrasse 25, Bielefeld
    33615, Germany
    不详
    68159, Germany
    arXiv,
  • [8] Convergence rates for the heavy-ball continuous dynamics for non-convex optimization, under Polyak-Lojasiewicz condition
    Apidopoulos, Vassilis
    Ginatta, Nicolo
    Villa, Silvia
    JOURNAL OF GLOBAL OPTIMIZATION, 2022, 84 (03) : 563 - 589
  • [9] NON-ERGODIC LINEAR CONVERGENCE PROPERTY OF THE DELAYED GRADIENT DESCENT UNDER THE STRONGLY CONVEXITY AND THE POLYAK-LOJASIEWICZ CONDITION
    Choi, Hyung Jun
    Choi, Woocheol
    Seok, Jinmyoung
    arXiv, 2023,
  • [10] CONVERGENCE OF RANDOM RESHUFFLING UNDER THE KURDYKA--\LOJASIEWICZ INEQUALITY
    Li, Xiao
    Milzarek, Andre
    Qiu, Junwen
    SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (02) : 1092 - 1120