Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

被引:0
|
作者
Cui, Xiaodong [1 ]
Zhang, Wei [1 ]
Tuske, Zoltan [1 ]
Picheny, Michael [1 ]
机构
[1] IBM Res AI, IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyperparameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
    Morse, Gregory
    Stanley, Kenneth O.
    [J]. GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, : 477 - 484
  • [2] Learning dynamics of gradient descent optimization in deep neural networks
    Wu, Wei
    Jing, Xiaoyuan
    Du, Wencai
    Chen, Guoliang
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
  • [3] Learning dynamics of gradient descent optimization in deep neural networks
    Wei Wu
    Xiaoyuan Jing
    Wencai Du
    Guoliang Chen
    [J]. Science China Information Sciences, 2021, 64
  • [4] Learning dynamics of gradient descent optimization in deep neural networks
    Wei WU
    Xiaoyuan JING
    Wencai DU
    Guoliang CHEN
    [J]. Science China(Information Sciences), 2021, 64 (05) : 17 - 31
  • [5] Optimizing Deep Neural Networks Through Neuroevolution With Stochastic Gradient Descent
    Zhang, Haichao
    Hao, Kuangrong
    Gao, Lei
    Wei, Bing
    Tang, Xuesong
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 111 - 121
  • [6] Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
    Cao, Yuan
    Gu, Quanquan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Strengthening Gradient Descent by Sequential Motion Optimization for Deep Neural Networks
    Le-Duc, Thang
    Nguyen, Quoc-Hung
    Lee, Jaehong
    Nguyen-Xuan, H.
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (03) : 565 - 579
  • [8] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    [J]. JOURNAL OF COMPLEXITY, 2021, 64
  • [9] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun Zhou
    Cong-ying Han
    Tian-de Guo
    [J]. Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
  • [10] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Zhou, Bai-cun
    Han, Cong-ying
    Guo, Tian-de
    [J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136