Residual Quantization for Low Bit-Width Neural Networks

被引：5

作者：

Li, Zefan ^{[1
]}

Ni, Bingbing ^{[1
]}

Yang, Xiaokang ^{[1
]}

Zhang, Wenjun ^{[1
]}

Gao, Wen ^{[2
]}

机构：

[1] Shanghi Jiao Tong Univ, Shanghai 200240, Peoples R China

[2] Peking Univ, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

基金：

美国国家科学基金会;

关键词：

Quantization (signal); Training; Computational modeling; Neurons; Degradation; Task analysis; Optimization; Deep learning; network quantization; binarization; network acceleration;

D O I：

10.1109/TMM.2021.3124095

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Neural network quantization has shown to be an effective way for network compression and acceleration. However, existing binary or ternary quantization methods suffer from two major issues. First, low bit-width input/activation quantization easily results in severe prediction accuracy degradation. Second, network training and quantization are always treated as two non-related tasks, leading to accumulated parameter training error and quantization error. In this work, we introduce a novel scheme, named Residual Quantization, to train a neural network with both weights and inputs constrained to low bit-width, e.g., binary or ternary values. On one hand, by recursively performing residual quantization, the resulting binary/ternary network is guaranteed to approximate the full-precision network with much smaller errors. On the other hand, we mathematically re-formulate the network training scheme in an EM-like manner, which iteratively performs network quantization and parameter optimization. During expectation, the low bit-width network is encouraged to approximate the full-precision network. During maximization, the low bit-width network is further tuned to gain better representation capability. Extensive experiments well demonstrate that the proposed quantization scheme outperforms previous low bit-width methods and achieves much closer performance to the full-precision counterpart.

引用

页码：214 / 227

页数：14

共 50 条

[1] QUANTIZATION AND TRAINING OF LOW BIT-WIDTH CONVOLUTIONAL NEURAL NETWORKS FOR OBJECT DETECTION
Yin, Penghang
Zhang, Shuai
Qi, Yingyong
Xin, Jack
[J]. JOURNAL OF COMPUTATIONAL MATHEMATICS, 2019, 37 (03) : 349 - 360
[2] Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks
Tuan Hoang
Thanh-Toan Do
Nguyen, Tam, V
Cheung, Ngai-Man
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2111 - 2118
[3] Combinatorial optimization for low bit-width neural networks
Zhou, Han
Ashrafi, Aida
Blaschko, Matthew B.
[J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2246 - 2252
[4] An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks
Chen, Qinyu
Fu, Yuxiang
Song, Wenqing
Cheng, Kaifeng
Lu, Zhonghai
Zhang, Chuan
Li, Li
[J]. ELECTRONICS, 2019, 8 (04)
[5] Accelerating Low Bit-Width Convolutional Neural Networks With Embedded FPGA
Jiao, Li
Luo, Cheng
Cao, Wei
Zhou, Xuegong
Wang, Lingli
[J]. 2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
[6] MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks
Chenglong Huang
Puguang Liu
Liang Fang
[J]. Applied Intelligence, 2021, 51 : 4561 - 4574
[7] MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks
Huang, Chenglong
Liu, Puguang
Fang, Liang
[J]. APPLIED INTELLIGENCE, 2021, 51 (07) : 4561 - 4574
[8] Low Bit-Width Convolutional Neural Network on RRAM
Cai, Yi
Tang, Tianqi
Xia, Lixue
Li, Boxun
Wang, Yu
Yang, Huazhong
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (07) : 1414 - 1427
[9] Towards Accurate Low Bit-Width Quantization with Multiple Phase Adaptations
Yan, Zhaoyi
Shi, Yemin
Wang, Yaowei
Tan, Mingkui
Li, Zheyang
Tan, Wenming
Tian, Yonghong
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6591 - 6598
[10] Accelerating Low Bit-width Neural Networks at the Edge, PIM or FPGA: A Comparative Study
Kochar, Nakul
Ekiert, Lucas
Najafi, Deniz
Fan, Deliang
Angizi, Shaahin
[J]. PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 625 - 630

← 1 2 3 4 5 →