Deep Compression of Pre-trained Transformer Models

被引:0
|
作者
Wang, Naigang [1 ]
Liu, Chi-Chun [1 ]
Venkataramani, Swagath [1 ]
Sen, Sanchari [1 ]
Chen, Chia-Yu [1 ]
El Maghraoui, Kaoutar [1 ]
Srinivasan, Vijayalakshmi [1 ]
Chang, Leland [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained transformer models have achieved remarkable success in natural language processing (NLP) and have recently become competitive alternatives to Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) in vision and speech tasks, respectively. Due to their excellent computational efficiency and scalability, transformer models can be trained on exceedingly large amounts of data at the expense of tremendous growth in model size. As high performance, large-scale, and pre-trained transformer models become increasingly available for users to download and fine-tune for customized downstream tasks, their deployment becomes challenging due to the vast amount of operations and large memory footprint. To address this challenge, we introduce methods to deeply compress pre-trained transformer models across three major application domains: NLP, speech, and vision. Specifically, we quantize transformer backbones down to 4-bit and further achieve 50% fine-grained structural sparsity on pre-trained BERT, Wav2vec2.0, and Vision Transformer (ViT) models to demonstrate 16x compression while maintaining model accuracy. This is achieved by identifying critical initialization strategies for quantization- and sparsity- aware fine-tuning as well as developing novel techniques such as quantizers with a zero-preserving format and scheduled dropout. These hardware-friendly techniques need only to be applied in the fine-tuning phase for downstream tasks, which renders them especially suitable for acceleration and deployment of pre-trained transformer models.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Pre-trained Models for Sonar Images
    Valdenegro-Toro, Matias
    Preciado-Grijalva, Alan
    Wehbe, Bilal
    OCEANS 2021: SAN DIEGO - PORTO, 2021,
  • [32] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [33] Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
    Xiao, Chaojun
    Luo, Yuqi
    Mang, Wenbi
    Zhang, Pengle
    Han, Xu
    Lie, Yankai
    Zhang, Zhengyan
    Xie, Ruobing
    Liu, Zhiyuan
    Sun, Maosong
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9947 - 9959
  • [34] Underwater Image Enhancement Using Pre-trained Transformer
    Boudiaf, Abderrahmene
    Guo, Yuhang
    Ghimire, Adarsh
    Werghi, Naoufel
    De Masi, Giulia
    Javed, Sajid
    Dias, Jorge
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 480 - 488
  • [35] Generative Pre-Trained Transformer for Cardiac Abnormality Detection
    Gaudilliere, Pierre Louis
    Sigurthorsdottir, Halla
    Aguet, Clementine
    Van Zaen, Jerome
    Lemay, Mathieu
    Delgado-Gonzalo, Ricard
    2021 COMPUTING IN CARDIOLOGY (CINC), 2021,
  • [36] OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
    Chen, Le
    Bhattacharjee, Arijit
    Ahmed, Nesreen
    Hasabnis, Niranjan
    Oren, Gal
    Vo, Vy
    Jannesari, Ali
    EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 121 - 134
  • [37] Multi-task Active Learning for Pre-trained Transformer-based Models
    Rotman, Guy
    Reichart, Roi
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1209 - 1228
  • [38] Incorporating Pre-trained Transformer Models into TextCNN for Sentiment Analysis on Software Engineering Texts
    Sun, Kexin
    Shi, XiaoBo
    Gao, Hui
    Kuang, Hongyu
    Ma, Xiaoxing
    Rong, Guoping
    Shao, Dong
    Zhao, Zheng
    Zhang, He
    13TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2022, 2022, : 127 - 136
  • [39] A Transformer Based Approach To Detect Suicidal Ideation Using Pre-Trained Language Models
    Haque, Farsheed
    Nur, Ragib Un
    Al Jahan, Shaeekh
    Mahmud, Zarar
    Shah, Faisal Muhammad
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [40] Attentional Masking for Pre-trained Deep Networks
    Wallenberg, Marcus
    Forssen, Per-Erik
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 6149 - 6154