Latent Weight Quantization for Integerized Training of Deep Neural Networks

被引:0
|
作者
Fei, Wen [1 ]
Dai, Wenrui [2 ]
Zhang, Liang [3 ]
Zhang, Luoming [4 ]
Li, Chenglin [1 ]
Zou, Junni [2 ]
Xiong, Hongkai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[3] Donghua Univ, Sch Comp Sci & Technol, Shanghai 201620, Peoples R China
[4] Zhejiang Univ, Key Lab Biomed Engn, Minist Educ, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Training; Perturbation methods; Memory management; Hardware; Trajectory; Random access memory; Graphics processing units; Computational modeling; Noise; Integerized training; deep neural network quantization; latent weight; dual quantizer; large language models;
D O I
10.1109/TPAMI.2025.3527498
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.
引用
收藏
页码:2816 / 2832
页数:17
相关论文
共 50 条
  • [1] Communication Quantization for Data-parallel Training of Deep Neural Networks
    Dryden, Nikoli
    Moon, Tim
    Jacobs, Sam Ade
    Van Essen, Brian
    PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), 2016, : 1 - 8
  • [2] Centered Weight Normalization in Accelerating Training of Deep Neural Networks
    Huang, Lei
    Liu, Xianglong
    Liu, Yang
    Lang, Bo
    Tao, Dacheng
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2822 - 2830
  • [3] Robust Quantization of Deep Neural Networks
    Kim, Youngseok
    Lee, Junyeol
    Kim, Younghoon
    Seo, Jiwon
    PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 74 - 84
  • [4] Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks
    Latotzke, Cecilia
    Balim, Batuhan
    Gemmeke, Tobias
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1559 - 1566
  • [5] Latent Training for Convolutional Neural Networks
    Huang, Zi
    Liu, Qi
    Chen, Zhiyuan
    Zhao, Yuming
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ESTIMATION, DETECTION AND INFORMATION FUSION ICEDIF 2015, 2015, : 55 - 60
  • [6] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
    Salimans, Tim
    Kingma, Diederik P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] Phase-limited quantization-aware training for diffractive deep neural networks
    Wang, Yu
    Sha, Qi
    Qi, Feng
    APPLIED OPTICS, 2025, 64 (06) : 1413 - 1419
  • [8] A survey of quantization methods for deep neural networks
    Yang C.
    Zhang R.
    Huang L.
    Ti S.
    Lin J.
    Dong Z.
    Chen S.
    Liu Y.
    Yin X.
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2023, 45 (10): : 1613 - 1629
  • [9] On the Effect of Quantization on Deep Neural Networks Performance
    Tmamna, Jihene
    Fourati, Rahma
    Ltifi, Hela
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 144 - 156
  • [10] Quantization and Deployment of Deep Neural Networks on Microcontrollers
    Novac, Pierre-Emmanuel
    Boukli Hacene, Ghouthi
    Pegatoquet, Alain
    Miramond, Benoit
    Gripon, Vincent
    SENSORS, 2021, 21 (09)