Improved Variance Reduction Methods for Riemannian Non-Convex Optimization

被引：4

作者：

Han, Andi ^{[1
]}

Gao, Junbin ^{[1
]}

机构：

[1] Univ Sydney, Business Sch, Discipline Business Analyt, Sydney, NSW 2006, Australia

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 11期

基金：

澳大利亚研究理事会;

关键词：

Complexity theory; Optimization; Manifolds; Convergence; Convex functions; Training; Principal component analysis; Riemannian optimization; non-convex optimization; online optimization; variance reduction; batch size adaptation;

D O I：

10.1109/TPAMI.2021.3112139

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Variance reduction is popular in accelerating gradient descent and stochastic gradient descent for optimization problems defined on both euclidean space and Riemannian manifold. This paper further improves on existing variance reduction methods for non-convex Riemannian optimization, including R-SVRG and R-SRG/R-SPIDER by providing a unified framework for batch size adaptation. Such framework is more general than the existing works by considering retraction and vector transport and mini-batch stochastic gradients. We show that the adaptive-batch variance reduction methods require lower gradient complexities for both general non-convex and gradient dominated functions, under both finite-sum and online optimization settings. Moreover, under the new framework, we complete the analysis of R-SVRG and R-SRG, which is currently missing in the literature. We prove convergence of R-SVRG with much simpler analysis, which leads to curvature-free complexity bounds. We also show improved results for R-SRG under double-loop convergence, which match the optimal complexities as the R-SPIDER. In addition, we prove the first online complexity results for R-SVRG and R-SRG. Lastly, we discuss the potential of adapting batch size for non-smooth, constrained and second-order Riemannian optimizers. Extensive experiments on a variety of applications support the analysis and claims in the paper.

引用

下载

页码：7610 / 7623

页数：14

共 50 条

[1] Variance Reduction for Faster Non-Convex Optimization
Allen-Zhu, Zeyuan
Hazan, Elad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[2] Variance Reduced Methods for Non-Convex Composition Optimization
Liu, Liu
Liu, Ji
Tao, Dacheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5813 - 5825
[3] Distributed Stochastic Gradient Tracking Algorithm With Variance Reduction for Non-Convex Optimization
Jiang, Xia
Zeng, Xianlin
Sun, Jian
Chen, Jie
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5310 - 5321
[4] Gradient Methods for Non-convex Optimization
Jain, Prateek
JOURNAL OF THE INDIAN INSTITUTE OF SCIENCE, 2019, 99 (02) : 247 - 256
[5] Gradient Methods for Non-convex Optimization
Prateek Jain
Journal of the Indian Institute of Science, 2019, 99 : 247 - 256
[6] Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds
Zhou, Pan
Yuan, Xiao-Tong
Yan, Shuicheng
Feng, Jiashi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) : 459 - 472
[7] Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds
Zhou, Pan
Yuan, Xiao-Tong
Feng, Jiashi
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 138 - 147
[8] Riemannian Stochastic Recursive Momentum Method for non-Convex Optimization
Han, Andi
Gao, Junbin
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2505 - 2511
[9] Stochastic variable metric proximal gradient with variance reduction for non-convex composite optimization
Gersende Fort
Eric Moulines
Statistics and Computing, 2023, 33 (3)
[10] Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization
Huo, Zhouyuan
Huang, Heng
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2043 - 2049

← 1 2 3 4 5 →