Second-order Optimization for Non-convex Machine Learning: an Empirical Study

被引：59

作者：

Xu, Peng ^{[1
]}

Roosta, Fred ^{[2
,3
]}

Mahoney, Michael W. ^{[4
,5
]}

机构：

[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA

[2] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia

[3] Int Comp Sci Inst, Berkeley, CA 94704 USA

[4] Univ Calif Berkeley, Int Comp Sci Inst, Berkeley, CA 94720 USA

[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) | 2020年

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1137/1.9781611976236.23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While first-order optimization methods such as SGD are popular in machine learning (ML), they come with wellknown deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.

引用

页码：199 / 207

页数：9

共 50 条

[31] Oracle complexity of second-order methods for smooth convex optimization
Yossi Arjevani
Ohad Shamir
Ron Shiff
Mathematical Programming, 2019, 178 : 327 - 360
[32] Second-Order Kernel Online Convex Optimization with Adaptive Sketching
Calandriello, Daniele
Lazaric, Alessandro
Valko, Michal
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[33] Second-order global optimality conditions for convex composite optimization
Yang, XQ
MATHEMATICAL PROGRAMMING, 1998, 81 (03) : 327 - 347
[34] Second-order global optimality conditions for convex composite optimization
XQ Yang
Mathematical Programming, 1998, 81 : 327 - 347
[35] Oracle complexity of second-order methods for smooth convex optimization
Arjevani, Yossi
Shamir, Ohad
Shiff, Ron
MATHEMATICAL PROGRAMMING, 2019, 178 (1-2) : 327 - 360
[36] A second-order accelerated neurodynamic approach for distributed convex optimization
Jiang, Xinrui
Qin, Sitian
Xue, Xiaoping
Liu, Xinzhi
NEURAL NETWORKS, 2022, 146 : 161 - 173
[37] Leveraging Non-uniformity in First-order Non-convex Optimization
Mei, Jincheng
Gao, Yue
Dai, Bo
Szepesvari, Csaba
Schuurmans, Dale
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[38] Solving a Class of Non-Convex Minimax Optimization in Federated Learning
Wu, Xidong
Sun, Jianhui
Hu, Zhengmian
Zhang, Aidong
Huang, Heng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[39] STUDY OF NECESSARY CONDITIONS FOR A NON-CONVEX OPTIMIZATION PROBLEM
JOURON, C
COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES SERIE A, 1975, 281 (23): : 1031 - 1034
[40] On Generalization Performance and Non-Convex Optimization of Extended ν-Support Vector Machine
Akiko Takeda
Masashi Sugiyama
New Generation Computing, 2009, 27 : 259 - 279

← 1 2 3 4 5 →