Second-order Optimization for Non-convex Machine Learning: an Empirical Study

被引:59
|
作者
Xu, Peng [1 ]
Roosta, Fred [2 ,3 ]
Mahoney, Michael W. [4 ,5 ]
机构
[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
[2] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
[4] Univ Calif Berkeley, Int Comp Sci Inst, Berkeley, CA 94720 USA
[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
澳大利亚研究理事会;
关键词
D O I
10.1137/1.9781611976236.23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While first-order optimization methods such as SGD are popular in machine learning (ML), they come with wellknown deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.
引用
收藏
页码:199 / 207
页数:9
相关论文
共 50 条
  • [1] Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations
    Arjevani, Yossi
    Carmon, Yair
    Duchi, John C.
    Foster, Dylan J.
    Sekhari, Ayush
    Sridharan, Karthik
    CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125
  • [2] Second-Order Step-Size Tuning of SGD for Non-Convex Optimization
    Camille Castera
    Jérôme Bolte
    Cédric Févotte
    Edouard Pauwels
    Neural Processing Letters, 2022, 54 : 1727 - 1752
  • [3] Second-Order Step-Size Tuning of SGD for Non-Convex Optimization
    Castera, Camille
    Bolte, Jerome
    Fevotte, Cedric
    Pauwels, Edouard
    NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1727 - 1752
  • [4] Second-Order Optimality in Non-Convex Decentralized Optimization via Perturbed Gradient Tracking
    Tziotis, Isidoros
    Caramanis, Constantine
    Mokhtari, Aryan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks
    Ganesh, Arun
    Liu, Daogao
    Oh, Sewoong
    Thakurta, Abhradeep
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Non-convex Optimization on Stiefel Manifold and Applications to Machine Learning
    Kanamori, Takafumi
    Takeda, Akiko
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 109 - 116
  • [7] Non-convex second-order Moreau’s sweeping processes in Hilbert spaces
    Sabrina Lounis
    Tahar Haddad
    Moustapha Sene
    Journal of Fixed Point Theory and Applications, 2017, 19 : 2895 - 2908
  • [8] Linearized ADMM Converges to Second-Order Stationary Points for Non-Convex Problems
    Lu, Songtao
    Lee, Jason D.
    Razaviyayn, Meisam
    Hong, Mingyi
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4859 - 4874
  • [9] Non-convex second-order Moreau's sweeping processes in Hilbert spaces
    Lounis, Sabrina
    Haddad, Tahar
    Sene, Moustapha
    JOURNAL OF FIXED POINT THEORY AND APPLICATIONS, 2017, 19 (04) : 2895 - 2908
  • [10] Spectral bundle methods for non-convex maximum eigenvalue functions: second-order methods
    Noll, D
    Apkarian, P
    MATHEMATICAL PROGRAMMING, 2005, 104 (2-3) : 729 - 747