Fast Convergence of Random Reshuffling Under Over-Parameterization and the Polyak-Lojasiewicz Condition

被引:1
|
作者
Fan, Chen [1 ]
Thrampoulidis, Christos [2 ]
Schmidt, Mark [1 ,3 ]
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada
[2] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC, Canada
[3] Canada CIFAR AI Chair Amii, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
OPTIMIZATION;
D O I
10.1007/978-3-031-43421-1_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient descent (SGD) known as random reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the beginning of each epoch and each iteration chooses the next sample from the permutation. For under-parameterized models, it has been shown RR can converge faster than SGD under certain assumptions. However, previous works do not show that RR outperforms SGD in over-parameterized settings except in some highly-restrictive scenarios. For the class of Polyak-Lojasiewicz (PL) functions, we show that RR can outperform SGD in over-parameterized settings when either one of the following holds: (i) the number of samples (n) is less than the product of the condition number (kappa) and the parameter (alpha) of a weak growth condition (WGC), or (ii) n is less than the parameter (rho) of a strong growth condition (SGC).
引用
收藏
页码:301 / 315
页数:15
相关论文
共 41 条
  • [21] A Convergence Theory for Deep Learning via Over-Parameterization
    Allen-Zhu, Zeyuan
    Li, Yuanzhi
    Song, Zhao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [22] Convergence analysis of deep Ritz method with over-parameterization
    Ding, Zhao
    Jiao, Yuling
    Lu, Xiliang
    Wu, Peiying
    Yang, Jerry Zhijian
    NEURAL NETWORKS, 2025, 184
  • [23] Gradient-Type Method for Optimization Problems with Polyak-Lojasiewicz Condition: Relative Inexactness in Gradient and Adaptive Parameters Setting
    Puchinin, Sergei M.
    Stonyakin, Fedor S.
    arXiv, 2023,
  • [24] On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Lojasiewicz Condition
    Bai, Yunyan
    Liu, Yuxing
    Luo, Luo
    arXiv,
  • [25] Robust Training under Label Noise by Over-parameterization
    Liu, Sheng
    Zhu, Zhihui
    Qu, Qing
    You, Chong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [26] CONVERGENCE OF VARIANCE-REDUCED LEARNING UNDER RANDOM RESHUFFLING
    Ying, Bicheng
    Yuan, Kun
    Sayed, Ali H.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2286 - 2290
  • [27] Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
    An, Jing
    Lu, Jianfeng
    arXiv, 2023,
  • [28] Convergence of ease-controlled random reshuffling gradient algorithms under Lipschitz smoothness
    Seccia, Ruggiero
    Coppola, Corrado
    Liuzzi, Giampaolo
    Palagi, Laura
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2025,
  • [29] Global Convergence of Sub-gradient Method for Robust Matrix Recovery: Small Initialization, Noisy Measurements, and Over-parameterization
    Ma, Jianhao
    Fattahi, Salar
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [30] Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Lojasiewicz Condition and Local Smoothness
    Scaman, Kevin
    Malherbe, Cédric
    Dos Santos, Ludovic
    Proceedings of Machine Learning Research, 2022, 162 : 19310 - 19327