Hebbian Descent: A Unified View on Log-Likelihood Learning

被引：0

作者：

Melchior, Jan ^{[1
]}

Schiewer, Robin ^{[1
]}

Wiskott, Laurenz ^{[1
]}

机构：

[1] Ruhr Univ Bochum, D-44801 Bochum, Germany

来源：

NEURAL COMPUTATION | 2024年 / 36卷 / 09期

关键词：

CONNECTIONIST MODELS; BACKPROPAGATION; STORAGE;

D O I：

10.1162/neco_a_01684

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.

引用

页码：1669 / 1712

页数：44

共 50 条

[31] Correcting systematic mismatches in computed log-likelihood ratios
van Dijk, M
Janssen, AJEM
Koppelaar, AGC
EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 2003, 14 (03): : 227 - 244
[32] A Donsker-Type Theorem for Log-Likelihood Processes
Su, Zhonggen
Wang, Hanchao
JOURNAL OF THEORETICAL PROBABILITY, 2020, 33 (03) : 1401 - 1425
[33] Generalized selection combining based on the log-likelihood ratio
Kim, SW
Kim, YG
Simon, MK
2003 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-5: NEW FRONTIERS IN TELECOMMUNICATIONS, 2003, : 2789 - 2794
[34] UNIFORM APPROXIMATION OF LOG-LIKELIHOOD RATIOS IN THE IID CASE
WEFELMEYER, W
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1987, 16 (05) : 1265 - 1280
[35] Human Pose Regression with Residual Log-likelihood Estimation
Li, Jiefeng
Bian, Siyuan
Zeng, Ailing
Wang, Can
Pang, Bo
Liu, Wentao
Lu, Cewu
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11005 - 11014
[36] Log-Likelihood Ratio Algorithm for Rate Compatible Modulation
Rao, Wengui
Dong, Yan
Lu, Fang
Wang, Shu
2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2013, : 1938 - 1941
[37] On Quantization of Log-Likelihood Ratios for Maximum Mutual Information
Bauer, Andreas Winkel
Matz, Gerald
2015 IEEE 16TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC), 2015, : 316 - 320
[38] A maximum log-likelihood approach to voice activity detection
Gauci, Oliver
Debono, Carl J.
Micallef, Paul
2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 383 - 387
[39] Log-likelihood of earthquake models: evaluation of models and forecasts
Harte, D. S.
GEOPHYSICAL JOURNAL INTERNATIONAL, 2015, 201 (02) : 711 - 723
[40] Log-likelihood ratio test for detecting transient change
Jaruskova, Daniela
Piterbarg, Vladimir I.
STATISTICS & PROBABILITY LETTERS, 2011, 81 (05) : 552 - 559

← 1 2 3 4 5 →