Deep Compression of Pre-trained Transformer Models

被引:0
|
作者
Wang, Naigang [1 ]
Liu, Chi-Chun [1 ]
Venkataramani, Swagath [1 ]
Sen, Sanchari [1 ]
Chen, Chia-Yu [1 ]
El Maghraoui, Kaoutar [1 ]
Srinivasan, Vijayalakshmi [1 ]
Chang, Leland [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained transformer models have achieved remarkable success in natural language processing (NLP) and have recently become competitive alternatives to Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) in vision and speech tasks, respectively. Due to their excellent computational efficiency and scalability, transformer models can be trained on exceedingly large amounts of data at the expense of tremendous growth in model size. As high performance, large-scale, and pre-trained transformer models become increasingly available for users to download and fine-tune for customized downstream tasks, their deployment becomes challenging due to the vast amount of operations and large memory footprint. To address this challenge, we introduce methods to deeply compress pre-trained transformer models across three major application domains: NLP, speech, and vision. Specifically, we quantize transformer backbones down to 4-bit and further achieve 50% fine-grained structural sparsity on pre-trained BERT, Wav2vec2.0, and Vision Transformer (ViT) models to demonstrate 16x compression while maintaining model accuracy. This is achieved by identifying critical initialization strategies for quantization- and sparsity- aware fine-tuning as well as developing novel techniques such as quantizers with a zero-preserving format and scheduled dropout. These hardware-friendly techniques need only to be applied in the fine-tuning phase for downstream tasks, which renders them especially suitable for acceleration and deployment of pre-trained transformer models.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Detecting Syntactic Change with Pre-trained Transformer Models
    Hou, Liwen
    Smith, David A.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3564 - 3574
  • [2] On the effect of dropping layers of pre-trained transformer models
    Sajjad, Hassan
    Dalvi, Fahim
    Durrani, Nadir
    Nakov, Preslav
    COMPUTER SPEECH AND LANGUAGE, 2022, 77
  • [3] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    Journal of Big Data, 9
  • [4] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [5] Deep Entity Matching with Pre-Trained Language Models
    Li, Yuliang
    Li, Jinfeng
    Suhara, Yoshihiko
    Doan, AnHai
    Tan, Wang-Chiew
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
  • [6] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
  • [7] Compression of Generative Pre-trained Language Models via Quantization
    Tao, Chaofan
    Hou, Lu
    Zhang, Wei
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Luo, Ping
    Wong, Ngai
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4821 - 4836
  • [8] Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models
    Miyazawa, Kazuki
    Kyuragi, Yuta
    Nagai, Takayuki
    IEEE ACCESS, 2022, 10 : 29821 - 29833
  • [9] An ensemble of pre-trained transformer models for imbalanced multiclass malware classification
    Demirkiran, Ferhat
    Cayir, Aykut
    Unal, Gur
    Dag, Hasan
    COMPUTERS & SECURITY, 2022, 121
  • [10] Leveraging Generative Pre-Trained Transformer Models for Standardizing Nursing Data
    Baranwal, Aseem
    Semenov, Alexander
    Salgado, Patricia de Oliveira
    Priola, Karen B.
    Yao, Yingwei
    Keenan, Gail M.
    Macieira, Tamara G. R.
    2024 IEEE 12TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, ICHI 2024, 2024, : 386 - 391