Second-order Optimization for Non-convex Machine Learning: an Empirical Study

被引：59

作者：

Xu, Peng ^{[1
]}

Roosta, Fred ^{[2
,3
]}

Mahoney, Michael W. ^{[4
,5
]}

机构：

[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA

[2] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia

[3] Int Comp Sci Inst, Berkeley, CA 94704 USA

[4] Univ Calif Berkeley, Int Comp Sci Inst, Berkeley, CA 94720 USA

[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) | 2020年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1137/1.9781611976236.23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While first-order optimization methods such as SGD are popular in machine learning (ML), they come with wellknown deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.

引用

页码：199 / 207

页数：9

共 50 条

[41] On Generalization Performance and Non-Convex Optimization of Extended ν-Support Vector Machine
Takeda, Akiko
Sugiyama, Masashi
NEW GENERATION COMPUTING, 2009, 27 (03) : 259 - 279
[42] An Accelerated First-Order Method for Non-convex Optimization on Manifolds
Christopher Criscitiello
Nicolas Boumal
Foundations of Computational Mathematics, 2023, 23 : 1433 - 1509
[43] An Accelerated First-Order Method for Non-convex Optimization on Manifolds
Criscitiello, Christopher
Boumal, Nicolas
FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2023, 23 (04) : 1433 - 1509
[44] Zeroth Order Non-convex optimization with Dueling-Choice Bandits
Xu, Yichong
Joshi, Aparna
Singh, Aarti
Dubrawski, Artur
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 899 - 908
[45] Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter
Allen-Zhu, Zeyuan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[46] Network localization by non-convex optimization
Saha, Ananya
Sau, Buddhadeb
MOBIMWAREHN'17: PROCEEDINGS OF THE 7TH ACM WORKSHOP ON MOBILITY, INTERFERENCE, AND MIDDLEWARE MANAGEMENT IN HETNETS, 2017,
[47] Gradient Methods for Non-convex Optimization
Jain, Prateek
JOURNAL OF THE INDIAN INSTITUTE OF SCIENCE, 2019, 99 (02) : 247 - 256
[48] Parallel continuous non-convex optimization
Holmqvist, K
Migdalas, A
Pardalos, PM
PARALLEL COMPUTING IN OPTIMIZATION, 1997, 7 : 471 - 527
[49] Gradient Methods for Non-convex Optimization
Prateek Jain
Journal of the Indian Institute of Science, 2019, 99 : 247 - 256
[50] Replica Exchange for Non-Convex Optimization
Dong, Jing
Tong, Xin T.
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22

← 1 2 3 4 5 →