On the Local Hessian in Back-propagation

被引:0
|
作者
Zhang, Huishuai [1 ]
Chen, Wei [1 ]
Liu, Tie-Yan [1 ]
机构
[1] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Back-propagation (BP) is the foundation for successfully training deep neural networks. However, BP sometimes has difficulties in propagating a learning signal deep enough effectively, e.g., the vanishing gradient phenomenon. Meanwhile, BP often works well when combining with "designing tricks" like orthogonal initialization, batch normalization and skip connection. There is no clear understanding on what is essential to the efficiency of BP. In this paper, we take one step towards clarifying this problem. We view BP as a solution of back-matching propagation which minimizes a sequence of back-matching losses each corresponding to one block of the network. We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP. It turns out that those designing tricks facilitate BP by improving the spectrum of local Hessian. In addition, we can utilize the local Hessian to balance the training pace of each block and design new training algorithms. Based on a scalar approximation of local Hessian, we propose a scale-amended SGD algorithm. We apply it to train neural networks with batch normalization, and achieve favorable results over vanilla SGD. This corroborates the importance of local Hessian from another side.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] On weights initialization of back-propagation networks
    Univ of Stuttgart, Stuttgart, Germany
    Neural Network World, 1 (89-100):
  • [22] Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty
    Nicolas Meuleau
    Paul Bourgine
    Machine Learning, 1999, 35 : 117 - 154
  • [23] Exploration of multi-state environments: Local measures and back-propagation of uncertainty
    Meuleau, N
    Bourgine, P
    MACHINE LEARNING, 1999, 35 (02) : 117 - 154
  • [24] An Interpretation of Forward-Propagation and Back-Propagation of DNN
    Xie, Guotian
    Lai, Jianhuang
    PATTERN RECOGNITION AND COMPUTER VISION, PT II, 2018, 11257 : 3 - 15
  • [25] Reviving and Improving Recurrent Back-Propagation
    Liao, Renjie
    Xiong, Yuwen
    Fetaya, Ethan
    Zhang, Lisa
    Yoon, KiJung
    Pitkow, Xaq
    Urtasun, Raquel
    Zemel, Richard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [26] CT image reconstruction by back-propagation
    Nakao, Z
    Ali, FEF
    Chen, YW
    FIRST INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED INTELLIGENT ELECTRONIC SYSTEMS, PROCEEDINGS 1997 - KES '97, VOLS 1 AND 2, 1997, : 323 - 326
  • [27] Theories of Error Back-Propagation in the Brain
    Whittington, James C. R.
    Bogacz, Rafal
    TRENDS IN COGNITIVE SCIENCES, 2019, 23 (03) : 235 - 250
  • [28] A parallel back-propagation adder structure
    Herrfeld, A
    Hentschke, S
    INTERNATIONAL JOURNAL OF ELECTRONICS, 1998, 85 (03) : 273 - 291
  • [29] Alternating Back-Propagation for Generator Network
    Han, Tian
    Lu, Yang
    Zhu, Song-Chun
    Wu, Ying Nian
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1976 - 1984
  • [30] Truncated Back-propagation for Bilevel Optimization
    Shaban, Amirreza
    Cheng, Ching-An
    Hatch, Nathan
    Boots, Byron
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89