Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning

被引：0

作者：

Yang, Zhuang ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

基金：

中国博士后科学基金;

关键词：

Powerball function; stochastic optimization; variance reduction; adaptive learning rate; non-convex optimization; REGULARIZATION; DESCENT; STEP;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stochastic optimization, especially stochastic gradient descent (SGD), is now the workhorse for the vast majority of problems in machine learning. Various strategies, e.g., control variates, adaptive learning rate, momentum technique, etc., have been developed to improve canonical SGD that is of a low convergence rate and the poor generalization in practice. Most of these strategies improve SGD that can be attributed to control the updating direction (e.g., gradient descent or gradient ascent direction), or manipulate the learning rate. Along these two lines, this work first develops and analyzes a novel type of improved powered stochastic gradient descent algorithms from the perspectives of variance reduction, where the updating direction was determined by the Powerball function. Additionally, to bridge the gap between powered stochastic optimization (PSO) and the learning rate, which is now still an open problem for PSO, we propose an adaptive mechanism of updating the learning rate that resorts the Barzilai-Borwein (BB) like scheme, not only for the proposed algorithm, but also for classical PSO algorithms. The theoretical properties of the resulting algorithms for non-convex optimization problems are technically analyzed. Empirical tests using various benchmark data sets indicate the efficiency and robustness of our proposed algorithms.

引用

页数：29

共 50 条

[31] High Per Parameter: A Large-Scale Study of Hyperparameter Tuning for Machine Learning Algorithms
Sipper, Moshe
[J]. ALGORITHMS, 2022, 15 (09)
[32] Towards provably efficient quantum algorithms for large-scale machine-learning models
Liu, Junyu
Liu, Minzhao
Liu, Jin-Peng
Ye, Ziyu
Wang, Yunfei
Alexeev, Yuri
Eisert, Jens
Jiang, Liang
[J]. NATURE COMMUNICATIONS, 2024, 15 (01)
[33] Towards provably efficient quantum algorithms for large-scale machine-learning models
Junyu Liu
Minzhao Liu
Jin-Peng Liu
Ziyu Ye
Yunfei Wang
Yuri Alexeev
Jens Eisert
Liang Jiang
[J]. Nature Communications, 15
[34] Efficient Machine Learning On Large-Scale Graphs
Erickson, Parker
Lee, Victor E.
Shi, Feng
Tang, Jiliang
[J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
[35] Large-scale kernel extreme learning machine
Deng, Wan-Yu
Zheng, Qing-Hua
Chen, Lin
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (11): : 2235 - 2246
[36] Machine learning for large-scale MOF screening
Coupry, Damien
Groot, Laurens
Addicoat, Matthew
Heine, Thomas
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
[37] Large-Scale Machine Learning and Neuroimaging in Psychiatry
Thompson, Paul
[J]. BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51
[38] Coding for Large-Scale Distributed Machine Learning
Xiao, Ming
Skoglund, Mikael
[J]. ENTROPY, 2022, 24 (09)
[39] Robust Large-Scale Machine Learning in the Cloud
Rendle, Steffen
Fetterly, Dennis
Shekita, Eugene J.
Su, Bor-yiing
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1125 - 1134
[40] Large-scale Machine Learning over Graphs
Yang, Yiming
[J]. PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9

← 1 2 3 4 5 →