Fast Convergence of Random Reshuffling Under Over-Parameterization and the Polyak-Lojasiewicz Condition

被引：1

作者：

Fan, Chen ^{[1
]}

Thrampoulidis, Christos ^{[2
]}

Schmidt, Mark ^{[1
,3
]}

机构：

[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada

[2] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC, Canada

[3] Canada CIFAR AI Chair Amii, Montreal, PQ, Canada

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV | 2023年 / 14172卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

OPTIMIZATION;

D O I：

10.1007/978-3-031-43421-1_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient descent (SGD) known as random reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the beginning of each epoch and each iteration chooses the next sample from the permutation. For under-parameterized models, it has been shown RR can converge faster than SGD under certain assumptions. However, previous works do not show that RR outperforms SGD in over-parameterized settings except in some highly-restrictive scenarios. For the class of Polyak-Lojasiewicz (PL) functions, we show that RR can outperform SGD in over-parameterized settings when either one of the following holds: (i) the number of samples (n) is less than the product of the condition number (kappa) and the parameter (alpha) of a weak growth condition (WGC), or (ii) n is less than the parameter (rho) of a strong growth condition (SGC).

引用

页码：301 / 315

页数：15