Second-order Optimization for Non-convex Machine Learning: an Empirical Study

被引：59

作者：

Xu, Peng ^{[1
]}

Roosta, Fred ^{[2
,3
]}

Mahoney, Michael W. ^{[4
,5
]}

机构：

[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA

[2] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia

[3] Int Comp Sci Inst, Berkeley, CA 94704 USA

[4] Univ Calif Berkeley, Int Comp Sci Inst, Berkeley, CA 94720 USA

[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) | 2020年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1137/1.9781611976236.23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While first-order optimization methods such as SGD are popular in machine learning (ML), they come with wellknown deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.

引用

页码：199 / 207

页数：9

共 50 条

[1] Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations
Arjevani, Yossi
Carmon, Yair
Duchi, John C.
Foster, Dylan J.
Sekhari, Ayush
Sridharan, Karthik
CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125
[2] Second-Order Step-Size Tuning of SGD for Non-Convex Optimization
Camille Castera
Jérôme Bolte
Cédric Févotte
Edouard Pauwels
Neural Processing Letters, 2022, 54 : 1727 - 1752
[3] Second-Order Step-Size Tuning of SGD for Non-Convex Optimization
Castera, Camille
Bolte, Jerome
Fevotte, Cedric
Pauwels, Edouard
NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1727 - 1752
[4] Second-Order Optimality in Non-Convex Decentralized Optimization via Perturbed Gradient Tracking
Tziotis, Isidoros
Caramanis, Constantine
Mokhtari, Aryan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[5] Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks
Ganesh, Arun
Liu, Daogao
Oh, Sewoong
Thakurta, Abhradeep
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Non-convex Optimization on Stiefel Manifold and Applications to Machine Learning
Kanamori, Takafumi
Takeda, Akiko
NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 109 - 116
[7] Non-convex second-order Moreau’s sweeping processes in Hilbert spaces
Sabrina Lounis
Tahar Haddad
Moustapha Sene
Journal of Fixed Point Theory and Applications, 2017, 19 : 2895 - 2908
[8] Linearized ADMM Converges to Second-Order Stationary Points for Non-Convex Problems
Lu, Songtao
Lee, Jason D.
Razaviyayn, Meisam
Hong, Mingyi
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4859 - 4874
[9] Non-convex second-order Moreau's sweeping processes in Hilbert spaces
Lounis, Sabrina
Haddad, Tahar
Sene, Moustapha
JOURNAL OF FIXED POINT THEORY AND APPLICATIONS, 2017, 19 (04) : 2895 - 2908
[10] Spectral bundle methods for non-convex maximum eigenvalue functions: second-order methods
Noll, D
Apkarian, P
MATHEMATICAL PROGRAMMING, 2005, 104 (2-3) : 729 - 747

← 1 2 3 4 5 →