Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

被引：0

作者：

Cui, Xiaodong ^{[1
]}

Zhang, Wei ^{[1
]}

Tuske, Zoltan ^{[1
]}

Picheny, Michael ^{[1
]}

机构：

[1] IBM Res AI, IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyperparameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

引用

页数：11

共 50 条

[1] Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
Morse, Gregory
Stanley, Kenneth O.
[J]. GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, : 477 - 484
[2] Learning dynamics of gradient descent optimization in deep neural networks
Wu, Wei
Jing, Xiaoyuan
Du, Wencai
Chen, Guoliang
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
[3] Learning dynamics of gradient descent optimization in deep neural networks
Wei Wu
Xiaoyuan Jing
Wencai Du
Guoliang Chen
[J]. Science China Information Sciences, 2021, 64
[4] Learning dynamics of gradient descent optimization in deep neural networks
Wei WU
Xiaoyuan JING
Wencai DU
Guoliang CHEN
[J]. Science China(Information Sciences), 2021, 64 (05) : 17 - 31
[5] Optimizing Deep Neural Networks Through Neuroevolution With Stochastic Gradient Descent
Zhang, Haichao
Hao, Kuangrong
Gao, Lei
Wei, Bing
Tang, Xuesong
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 111 - 121
[6] Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Cao, Yuan
Gu, Quanquan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[7] Strengthening Gradient Descent by Sequential Motion Optimization for Deep Neural Networks
Le-Duc, Thang
Nguyen, Quoc-Hung
Lee, Jaehong
Nguyen-Xuan, H.
[J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (03) : 565 - 579
[8] Non-convergence of stochastic gradient descent in the training of deep neural networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
[J]. JOURNAL OF COMPLEXITY, 2021, 64
[9] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun Zhou
Cong-ying Han
Tian-de Guo
[J]. Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
[10] Convergence of Stochastic Gradient Descent in Deep Neural Network
Zhou, Bai-cun
Han, Cong-ying
Guo, Tian-de
[J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136

← 1 2 3 4 5 →