Exploring Strategies for Training Deep Neural Networks

被引:0
|
作者
Larochelle, Hugo [1 ]
Bengio, Yoshua [1 ]
Louradour, Jerome [1 ]
Lamblin, Pascal [1 ]
机构
[1] Univ Montreal, Dept Informat & Rech Operat, Montreal, PQ H3T 1J8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
artificial neural networks; deep belief networks; restricted Boltzmann machines; autoassociators; unsupervised learning; COMPONENT ANALYSIS; BLIND SEPARATION; DIMENSIONALITY; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM) to initialize the parameters of a deep belief network (DBN), a generative model with many layers of hidden causal variables. This was followed by the proposal of another greedy layer-wise procedure, relying on the usage of autoassociator networks. In the context of the above optimization problem, we study these algorithms empirically to better understand their success. Our experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. We also present a series of experiments aimed at evaluating the link between the performance of deep neural networks and practical aspects of their topology, for example, demonstrating cases where the addition of more depth helps. Finally, we empirically explore simple variants of these training algorithms, such as the use of different RBM input unit distributions, a simple way of combining gradient estimators to improve performance, as well as on-line versions of those algorithms.
引用
收藏
页码:1 / 40
页数:40
相关论文
共 50 条
  • [21] Local Critic Training of Deep Neural Networks
    Lee, Hojung
    Lee, Jong-Seok
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [22] An Optimization Strategy for Deep Neural Networks Training
    Wu, Tingting
    Zeng, Peng
    Song, Chunhe
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 596 - 603
  • [23] The Impact of Architecture on the Deep Neural Networks Training
    Rozycki, Pawel
    Kolbusz, Janusz
    Malinowski, Aleksander
    Wilamowski, Bogdan
    2019 12TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2019, : 41 - 46
  • [24] DANTE: Deep alternations for training neural networks
    Sinha, Vaibhav B.
    Kudugunta, Sneha
    Sankar, Adepu Ravi
    Chavali, Surya Teja
    Balasubramanian, Vineeth N.
    NEURAL NETWORKS, 2020, 131 : 127 - 143
  • [25] Deep Energy: Task Driven Training of Deep Neural Networks
    Golts, Alona
    Freedman, Daniel
    Elad, Michael
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 324 - 338
  • [26] Exploring the Design Space of Efficient Deep Neural Networks
    Yu, Fuxun
    Stamoulis, Dimitrios
    Wang, Di
    Lymberopoulos, Dimitrios
    Chen, Xiang
    2020 IEEE/ACM SYMPOSIUM ON EDGE COMPUTING (SEC 2020), 2020, : 317 - 318
  • [27] Exploring deep neural networks for multitarget stance detection
    Sobhani, Parinaz
    Inkpen, Diana
    Zhu, Xiaodan
    COMPUTATIONAL INTELLIGENCE, 2019, 35 (01) : 82 - 97
  • [28] Exploring robust architectures for deep artificial neural networks
    Asim Waqas
    Hamza Farooq
    Nidhal C. Bouaynaya
    Ghulam Rasool
    Communications Engineering, 1 (1):
  • [29] EXPLORING DEEP NEURAL NETWORKS AND DEEP AUTOENCODERS IN REVERBERANT SPEECH RECOGNITION
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 197 - 201
  • [30] Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training
    Fang, Cong
    He, Hangfeng
    Long, Qi
    Su, Weijie J.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2021, 118 (43)