Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning

被引：1

作者：

Yang, Zhuang ^{[1
,2
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

[2] Sun Yat sen Univ, Sch Elect & Commun Engn, Guangzhou 510275, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年

基金：

中国博士后科学基金;

关键词：

Adaptive step size; machine learning; mini-batches; stochastic conjugate gradient (SCG); variance reduction; MINI-BATCH ALGORITHMS; SIZE SELECTION; SEARCH;

D O I：

10.1109/TNNLS.2023.3280826

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Conjugate gradient (CG), as an effective technique to speed up gradient descent algorithms, has shown great potential and has widely been used for large-scale machine-learning problems. However, CG and its variants have not been devised for the stochastic setting, which makes them extremely unstable, and even leads to divergence when using noisy gradients. This article develops a novel class of stable stochastic CG (SCG) algorithms with a faster convergence rate via the variance-reduced technique and an adaptive step size rule in the mini-batch setting. Actually, replacing the use of a line search in the CG-type approaches which is time-consuming, or even fails for SCG, this article considers using the random stabilized Barzilai-Borwein (RSBB) method to obtain an online step size. We rigorously analyze the convergence properties of the proposed algorithms and show that the proposed algorithms attain a linear convergence rate for both the strongly convex and nonconvex settings. Also, we show that the total complexity of the proposed algorithms matches that of modern stochastic optimization algorithms under different cases. Scores of numerical experiments on machine-learning problems demonstrate that the proposed algorithms outperform state-of-the-art stochastic optimization algorithms.

引用

页码：1 / 14

页数：14

共 50 条

[1] Large-scale machine learning with fast and stable stochastic conjugate gradient
Yang, Zhuang
[J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 173
[2] Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning
Yang, Zhuang
[J]. IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (06) : 1598 - 1606
[3] Large-Scale Machine Learning with Stochastic Gradient Descent
Bottou, Leon
[J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
[4] An online conjugate gradient algorithm for large-scale data analysis in machine learning
Xue, Wei
Wan, Pengcheng
Li, Qiao
Zhong, Ping
Yu, Gaohang
Tao, Tao
[J]. AIMS MATHEMATICS, 2021, 6 (02): : 1515 - 1537
[5] Adaptive stochastic conjugate gradient for machine learning
Yang, Zhuang
[J]. Expert Systems with Applications, 2022, 206
[6] Adaptive stochastic conjugate gradient for machine learning
Yang, Zhuang
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 206
[7] MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING
Wiesler, Simon
Richard, Alexander
Schlueter, Ralf
Ney, Hermann
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[8] Stochastic Conjugate Gradient Descent Twin Support Vector Machine for Large Scale Pattern Classification
Sharma, Sweta
Rastogi, Reshma
[J]. AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 590 - 602
[9] Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning
Yang, Zhuang
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[10] Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning
Liu, Yuanyuan
Shang, Fanhua
Liu, Hongying
Kong, Lin
Jiao, Licheng
Lin, Zhouchen
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4242 - 4255

← 1 2 3 4 5 →