Efficient Neural Network Training via Forward and Backward Propagation Sparsification

被引：0

作者：

Zhou, Xiao ^{[1
]}

Zhang, Weizhong ^{[1
]}

Chen, Zonghao ^{[2
]}

Diao, Shizhe ^{[1
]}

Zhang, Tong ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods cannot achieve this goal in practice because the chain rule based gradient (w.r.t. structure parameters) estimators adopted by previous methods require dense computation at least in the backward propagation step. This paper solves this problem by proposing an efficient sparse training method with completely sparse forward and backward passes. We first formulate the training process as a continuous minimization problem under global sparsity constraint. We then separate the optimization process into two steps, corresponding to weight update and structure parameter update. For the former step, we use the conventional chain rule, which can be sparse via exploiting the sparse structure. For the latter step, instead of using the chain rule based gradient estimators as in existing methods, we propose a variance reduced policy gradient estimator, which only requires two forward passes without backward propagation, thus achieving completely sparse training. We prove that the variance of our gradient estimator is bounded. Extensive experimental results on real-world datasets demonstrate that compared to previous methods, our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.

引用

页数：14

共 50 条

[1] Feed Forward Neural Network Sparsification with Dynamic Pruning
Chouliaras, Andreas
Fragkou, Evangelia
Katsaros, Dimitrios
[J]. 25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 12 - 17
[2] Training Recurrent Neural Networks via Forward Propagation Through Time
Kag, Anil
Saligrama, Venkatesh
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[3] Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators
Chitty-Venkata, Krishna Teja
Somani, Arun K.
[J]. 2020 IEEE 31ST INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2020), 2020, : 37 - 44
[4] The Hessian by blocks for neural network by backward propagation
Bessi, Radhia
Gmati, Nabil
[J]. JOURNAL OF TAIBAH UNIVERSITY FOR SCIENCE, 2024, 18 (01):
[5] Forward-forward training of an optical neural network
Oguz, Ilker
Ke, Junjie
Weng, Qifei
Yang, Feng
Yildirim, Mustafa
Dinc, Niyazi Ulas
Hsieh, Jih-Liang
Moser, Christophe
Psaltis, Demetri
[J]. OPTICS LETTERS, 2023, 48 (20) : 5249 - 5252
[6] Efficient Robust Training via Backward Smoothing
Chen, Jinghui
Cheng, Yu
Gan, Zhe
Gu, Quanquan
Liu, Jingjing
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6222 - 6230
[7] A PARALLEL IMPLEMENTATION OF THE BACKWARD ERROR PROPAGATION NEURAL NETWORK TRAINING ALGORITHM - EXPERIMENTS IN EVENT IDENTIFICATION
SITTIG, DF
ORR, JA
[J]. COMPUTERS AND BIOMEDICAL RESEARCH, 1992, 25 (06): : 547 - 561
[8] Combining Forward and Backward Propagation
Zaki, Amira
Abdennadher, Slim
Fruehwirth, Thom
[J]. FRONTIERS OF COMBINING SYSTEMS, FROCOS 2015, 2015, 9322 : 307 - 322
[9] Efficient Recurrent Neural Networks via Importance-Based Sparsification
Ren, Jiankang
Ni, Zheng
Su, Xiaoyan
Zhang, Haijun
Li, Haifang
Li, Shengyu
[J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024,
[10] Diagnosis of Neural Network via Backward Deduction
Yin, Peifeng
Huang, Lei
Lee, Sunhwan
Qiao, Mu
Asthana, Shubhi
Nakamura, Tagiga
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 260 - 267

← 1 2 3 4 5 →