Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization

被引：0

作者：

Liu, Jin ^{[1
]}

Pan, Xiaokang ^{[1
]}

Duan, Junwen ^{[1
]}

Li, Hong-Dong ^{[1
]}

Li, Youqi ^{[2
]}

Qu, Zhe ^{[1
]}

机构：

[1] Cent South Univ, Sch Comp Sci & Engn, Changsha, Peoples R China

[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper delves into the realm of stochastic optimization for compositional minimax optimization-a pivotal challenge across various machine learning domains, including deep AUC and reinforcement learning policy evaluation. Despite its significance, the problem of compositional minimax optimization is still under-explored. Adding to the complexity, current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes. To respond to these constraints, this paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity and obtain the nearly accuracy solution, matching the existing minimax methods. We also demonstrate that NSTORM can achieve the same sample complexity under the Polyak-Lojasiewicz (PL)-condition-an insightful extension of its capabilities. Yet, NSTORM encounters an issue with its requirement for low learning rates, potentially constraining its real-world applicability in machine learning. To overcome this hurdle, we present ADAptive NSTORM (ADA-NSTORM) with adaptive learning rates. We demonstrate that ADA-NSTORM can achieve the same sample complexity but the experimental results show its more effectiveness. All the proposed complexities indicate that our proposed methods can match lower bounds to existing minimax optimizations, without requiring a large batch size in each iteration. Extensive experiments support the efficiency of our proposed methods.

引用

页码：13927 / 13935

页数：9

共 50 条

[41] General inertial proximal stochastic variance reduction gradient for nonconvex nonsmooth optimization
Sun, Shuya
He, Lulu
[J]. JOURNAL OF INEQUALITIES AND APPLICATIONS, 2023, 2023 (01)
[42] Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise
Kulunchakov, Andrei
Mairal, Julien
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[43] General inertial proximal stochastic variance reduction gradient for nonconvex nonsmooth optimization
Shuya Sun
Lulu He
[J]. Journal of Inequalities and Applications, 2023
[44] Estimate sequences for stochastic composite optimization: Variance reduction, acceleration, and robustness to noise
Kulunchakov, Andrei
Mairal, Julien
[J]. Journal of Machine Learning Research, 2020, 21
[45] SAAGs: Biased stochastic variance reduction methods for large-scale learning
Vinod Kumar Chauhan
Anuj Sharma
Kalpana Dahiya
[J]. Applied Intelligence, 2019, 49 : 3331 - 3361
[46] Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
Robert M. Gower
Peter Richtárik
Francis Bach
[J]. Mathematical Programming, 2021, 188 : 135 - 192
[47] Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
Gower, Robert M.
Richtarik, Peter
Bach, Francis
[J]. MATHEMATICAL PROGRAMMING, 2021, 188 (01) : 135 - 192
[48] SAAGs: Biased stochastic variance reduction methods for large-scale learning
Chauhan, Vinod Kumar
Sharma, Anuj
Dahiya, Kalpana
[J]. APPLIED INTELLIGENCE, 2019, 49 (09) : 3331 - 3361
[49] SWARMING FOR FASTER CONVERGENCE IN STOCHASTIC OPTIMIZATION
Pu, Shi
Garcia, Alfredo
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2018, 56 (04) : 2997 - 3020
[50] Local Minimax Complexity of Stochastic Convex Optimization
Zhu, Yuancheng
Chatterjee, Sabyasachi
Duchi, John
Lafferty, John
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →