Hebbian Descent: A Unified View on Log-Likelihood Learning

被引:0
|
作者
Melchior, Jan [1 ]
Schiewer, Robin [1 ]
Wiskott, Laurenz [1 ]
机构
[1] Ruhr Univ Bochum, D-44801 Bochum, Germany
关键词
CONNECTIONIST MODELS; BACKPROPAGATION; STORAGE;
D O I
10.1162/neco_a_01684
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.
引用
收藏
页码:1669 / 1712
页数:44
相关论文
共 50 条
  • [41] A Comparison between BP, Log-Likelihood and Max Log-Likelihood Decoding Algorithms of LDPC Codes Based on EXIT Chart and EXIT Trajectories Methods
    Refaey-Ahmed, A.
    Chouinard, J. Yves
    Roy, Sebastien
    Fortier, Paul
    2008 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS, PROCEEDINGS, 2008, : 193 - 198
  • [42] Robust and Simple Log-Likelihood Approximation for Receiver Design
    Mestrah, Yasser
    Savard, Anne
    Goupil, Alban
    Gelle, Guillaume
    Clavier, Laurent
    2019 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2019,
  • [43] A Log-Likelihood Ratio based Generalized Belief Propagation
    Amaricai, Alexandru
    Bahrami, Mohsem
    Vasic, Bane
    PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES (IEEE EUROCON 2019), 2019,
  • [44] Generalized selection combining based on the log-likelihood ratio
    Kim, SW
    Kim, YG
    Simon, MK
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2004, 52 (04) : 521 - 524
  • [45] Quantization of Log-Likelihood Ratios to Maximize Mutual Information
    Rave, Wolfgang
    IEEE SIGNAL PROCESSING LETTERS, 2009, 16 (04) : 283 - 286
  • [46] Deformation of log-likelihood loss function for multiclass boosting
    Kanamori, Takafumi
    NEURAL NETWORKS, 2010, 23 (07) : 843 - 864
  • [47] Bitwise Log-Likelihood Ratios for Quadrature Amplitude Modulations
    Park, Sung-Joon
    IEEE COMMUNICATIONS LETTERS, 2015, 19 (06) : 921 - 924
  • [48] Unbiased and efficient log-likelihood estimation with inverse binomial sampling
    van Opheusden, Bas
    Acerbi, Luigi
    Ma, Wei Ji
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (12)
  • [49] The expected log-likelihood gain for decision making in molecular replacement
    Oeffner, Robert D.
    Afonine, Pavel V.
    Millan, Claudia
    Sammito, Massimo
    Uson, Isabel
    Read, Randy J.
    McCoy, Airlie J.
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2018, 74 : E410 - E410
  • [50] Posterior Simulation via the Signed Root Log-Likelihood Ratio
    Kharroubi, S. A.
    Sweeting, T. J.
    BAYESIAN ANALYSIS, 2010, 5 (04): : 787 - 815