Second-order Optimization for Non-convex Machine Learning: an Empirical Study

被引:59
|
作者
Xu, Peng [1 ]
Roosta, Fred [2 ,3 ]
Mahoney, Michael W. [4 ,5 ]
机构
[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
[2] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
[4] Univ Calif Berkeley, Int Comp Sci Inst, Berkeley, CA 94720 USA
[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
澳大利亚研究理事会;
关键词
D O I
10.1137/1.9781611976236.23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While first-order optimization methods such as SGD are popular in machine learning (ML), they come with wellknown deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.
引用
收藏
页码:199 / 207
页数:9
相关论文
共 50 条
  • [41] On Generalization Performance and Non-Convex Optimization of Extended ν-Support Vector Machine
    Takeda, Akiko
    Sugiyama, Masashi
    NEW GENERATION COMPUTING, 2009, 27 (03) : 259 - 279
  • [42] An Accelerated First-Order Method for Non-convex Optimization on Manifolds
    Christopher Criscitiello
    Nicolas Boumal
    Foundations of Computational Mathematics, 2023, 23 : 1433 - 1509
  • [43] An Accelerated First-Order Method for Non-convex Optimization on Manifolds
    Criscitiello, Christopher
    Boumal, Nicolas
    FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2023, 23 (04) : 1433 - 1509
  • [44] Zeroth Order Non-convex optimization with Dueling-Choice Bandits
    Xu, Yichong
    Joshi, Aparna
    Singh, Aarti
    Dubrawski, Artur
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 899 - 908
  • [45] Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter
    Allen-Zhu, Zeyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [46] Network localization by non-convex optimization
    Saha, Ananya
    Sau, Buddhadeb
    MOBIMWAREHN'17: PROCEEDINGS OF THE 7TH ACM WORKSHOP ON MOBILITY, INTERFERENCE, AND MIDDLEWARE MANAGEMENT IN HETNETS, 2017,
  • [47] Gradient Methods for Non-convex Optimization
    Jain, Prateek
    JOURNAL OF THE INDIAN INSTITUTE OF SCIENCE, 2019, 99 (02) : 247 - 256
  • [48] Parallel continuous non-convex optimization
    Holmqvist, K
    Migdalas, A
    Pardalos, PM
    PARALLEL COMPUTING IN OPTIMIZATION, 1997, 7 : 471 - 527
  • [49] Gradient Methods for Non-convex Optimization
    Prateek Jain
    Journal of the Indian Institute of Science, 2019, 99 : 247 - 256
  • [50] Replica Exchange for Non-Convex Optimization
    Dong, Jing
    Tong, Xin T.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22