Second-order Optimization for Non-convex Machine Learning: an Empirical Study

被引:59
|
作者
Xu, Peng [1 ]
Roosta, Fred [2 ,3 ]
Mahoney, Michael W. [4 ,5 ]
机构
[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
[2] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
[4] Univ Calif Berkeley, Int Comp Sci Inst, Berkeley, CA 94720 USA
[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
澳大利亚研究理事会;
关键词
D O I
10.1137/1.9781611976236.23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While first-order optimization methods such as SGD are popular in machine learning (ML), they come with wellknown deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.
引用
收藏
页码:199 / 207
页数:9
相关论文
共 50 条
  • [31] Oracle complexity of second-order methods for smooth convex optimization
    Yossi Arjevani
    Ohad Shamir
    Ron Shiff
    Mathematical Programming, 2019, 178 : 327 - 360
  • [32] Second-Order Kernel Online Convex Optimization with Adaptive Sketching
    Calandriello, Daniele
    Lazaric, Alessandro
    Valko, Michal
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [33] Second-order global optimality conditions for convex composite optimization
    Yang, XQ
    MATHEMATICAL PROGRAMMING, 1998, 81 (03) : 327 - 347
  • [34] Second-order global optimality conditions for convex composite optimization
    XQ Yang
    Mathematical Programming, 1998, 81 : 327 - 347
  • [35] Oracle complexity of second-order methods for smooth convex optimization
    Arjevani, Yossi
    Shamir, Ohad
    Shiff, Ron
    MATHEMATICAL PROGRAMMING, 2019, 178 (1-2) : 327 - 360
  • [36] A second-order accelerated neurodynamic approach for distributed convex optimization
    Jiang, Xinrui
    Qin, Sitian
    Xue, Xiaoping
    Liu, Xinzhi
    NEURAL NETWORKS, 2022, 146 : 161 - 173
  • [37] Leveraging Non-uniformity in First-order Non-convex Optimization
    Mei, Jincheng
    Gao, Yue
    Dai, Bo
    Szepesvari, Csaba
    Schuurmans, Dale
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] Solving a Class of Non-Convex Minimax Optimization in Federated Learning
    Wu, Xidong
    Sun, Jianhui
    Hu, Zhengmian
    Zhang, Aidong
    Huang, Heng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [39] STUDY OF NECESSARY CONDITIONS FOR A NON-CONVEX OPTIMIZATION PROBLEM
    JOURON, C
    COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES SERIE A, 1975, 281 (23): : 1031 - 1034
  • [40] On Generalization Performance and Non-Convex Optimization of Extended ν-Support Vector Machine
    Akiko Takeda
    Masashi Sugiyama
    New Generation Computing, 2009, 27 : 259 - 279