Optimizing nonlinear activation function for convolutional neural networks

被引：37

作者：

Varshney, Munender ^{[1
]}

Singh, Pravendra ^{[1
]}

机构：

[1] Indian Inst Technol Kanpur, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2021年 / 15卷 / 06期

关键词：

FReLU; ReLU; CNN; Convolutional neural network; Activation function;

D O I：

10.1007/s11760-021-01863-z

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Activation functions play a critical role in the training and performance of the deep convolutional neural networks. Currently, the rectified linear unit (ReLU) is the most commonly used activation function for the deep CNNs. ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. In this work, we propose a novel approach to generalize the ReLU activation function using multiple learnable slope parameters. These learnable slope parameters are optimized for every channel, which leads to the learning of a more generalized activation function (a variant of ReLU) corresponding to each channel. This activation is named as fully parametric rectified linear unit (FReLU) and trained using an alternate optimization technique by learning one set of parameters, keeping another set of parameters frozen. Our experiments show that the method outperforms ReLU and its other variant activation functions and also generalizes over various tasks such as image classification, object detection and action recognition in videos. The Top-1 classification accuracy of FReLU on ImageNet improves by 3.75% for MobileNet and similar to 2% for ResNet-50 over ReLU. We also provide various analyses for better interpretability of our proposed activation function.

引用

页码：1323 / 1330

页数：8

共 50 条

[41] Function of nonlinear asymmetrical neural networks
Ishii, N
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1997, E80A (09) : 1604 - 1609
[42] Function of nonlinear asymmetrical neural networks
Nagoya Inst of Technology, Nagoya-shi, Japan
IEICE Trans Fund Electron Commun Comput Sci, 9 (1604-1609):
[43] Optimizing Hyperparameters for Thai Cuisine Recognition via Convolutional Neural Networks
Theera-Ampornpunt, Nawanol
Treepong, Panisa
TRAITEMENT DU SIGNAL, 2023, 40 (03) : 1187 - 1193
[44] Optimizing Stochastic Computing for Low Latency Inference of Convolutional Neural Networks
Chen, Zhiyuan
Ma, Yufei
Wang, Zhongfeng
2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
[45] Optimizing convolutional neural networks on multi-core vector accelerator
Liu, Zhong
Xiao, Xin
Li, Chen
Ma, Sheng
Rangyu, Deng
PARALLEL COMPUTING, 2022, 112
[46] Towards Optimizing Convolutional Neural Networks for Robotic Surgery Skill Evaluation
Castro, Dayvid
Pereira, Danilo
Zanchettin, Cleber
Macedo, David
Bezerra, Byron L. D.
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[47] Optimizing Pretrained Convolutional Neural Networks for Tomato Leaf Disease Detection
Ahmad, Iftikhar
Hamid, Muhammad
Yousaf, Suhail
Shah, Syed Tanveer
Ahmad, Muhammad Ovais
COMPLEXITY, 2020, 2020
[48] Object recognition algorithm based on optimized nonlinear activation function-global convolutional neural network
Feng-Ping An
Jun-e Liu
Lei Bai
The Visual Computer, 2022, 38 : 541 - 553
[49] Object recognition algorithm based on optimized nonlinear activation function-global convolutional neural network
An, Feng-Ping
Liu, Jun-e
Bai, Lei
VISUAL COMPUTER, 2022, 38 (02): : 541 - 553
[50] A surface-normal photodetector as nonlinear activation function in diffractive optical neural networks
Ashtiani, F.
Idjadi, M. H.
Hu, T. C.
Grillanda, S.
Neilson, D.
Earnshaw, M.
Cappuzzo, M.
Kopf, R.
Tate, A.
Blanco-Redondo, A.
APL PHOTONICS, 2023, 8 (12)

← 1 2 3 4 5 →