Latent Weight Quantization for Integerized Training of Deep Neural Networks

被引：0

作者：

Fei, Wen ^{[1
]}

Dai, Wenrui ^{[2
]}

Zhang, Liang ^{[3
]}

Zhang, Luoming ^{[4
]}

Li, Chenglin ^{[1
]}

Zou, Junni ^{[2
]}

Xiong, Hongkai ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

[3] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China

[4] Zhejiang Univ, Key Lab Biomed Engn, Minist Educ, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2025年 / 47卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Quantization (signal); Training; Perturbation methods; Memory management; Hardware; Trajectory; Random access memory; Graphics processing units; Computational modeling; Noise; Integerized training; deep neural network quantization; latent weight; dual quantizer; large language models;

D O I：

10.1109/TPAMI.2025.3527498

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.

引用

页码：2816 / 2832

页数：17

共 50 条

[31] Training deep quantum neural networks
Kerstin Beer
Dmytro Bondarenko
Terry Farrelly
Tobias J. Osborne
Robert Salzmann
Daniel Scheiermann
Ramona Wolf
Nature Communications, 11
[32] NOISY TRAINING FOR DEEP NEURAL NETWORKS
Meng, Xiangtao
Liu, Chao
Zhang, Zhiyong
Wang, Dong
2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 16 - 20
[33] Post-training Quantization of Deep Neural Network Weights
Khayrov, E. M.
Malsagov, M. Yu.
Karandashev, I. M.
ADVANCES IN NEURAL COMPUTATION, MACHINE LEARNING, AND COGNITIVE RESEARCH III, 2020, 856 : 230 - 238
[34] Weight Evolution: Improving Deep Neural Networks Training through Evolving InferiorWeight Values
Lin, Zhenquan
Guo, Kailing
Xing, Xiaofen
Xu, Xiangmin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2176 - 2184
[35] OptQuant: Distributed training of neural networks with optimized quantization mechanisms
He, Li
Zheng, Shuxin
Chen, Wei
Ma, Zhi-Ming
Liu, Tie-Yan
NEUROCOMPUTING, 2019, 340 : 233 - 244
[36] Fault tolerant training of neural networks for learning vector quantization
Minohara, Takashi
NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 786 - 795
[37] Value-Aware Quantization for Training and Inference of Neural Networks
Park, Eunhyeok
Yoo, Sungjoo
Vajda, Peter
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 608 - 624
[38] Normalized Post-training Quantization for Photonic Neural Networks
Kirtas, M.
Passalis, N.
Oikonomou, A.
Mourgias-Alexandris, G.
Moralis-Pegios, M.
Pleros, N.
Tefas, A.
2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 657 - 663
[39] Post-training Quantization for Neural Networks with Provable Guarantees*
Zhang, Jinjie
Zhou, Yixuan
Saab, Rayan
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (02): : 373 - 399
[40] Training Neural Networks by Rational Weight Functions
Zhang, Daiyuan
ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 346 - 353

← 1 2 3 4 5 →