Deep Compression of Pre-trained Transformer Models

被引:0
|
作者
Wang, Naigang [1 ]
Liu, Chi-Chun [1 ]
Venkataramani, Swagath [1 ]
Sen, Sanchari [1 ]
Chen, Chia-Yu [1 ]
El Maghraoui, Kaoutar [1 ]
Srinivasan, Vijayalakshmi [1 ]
Chang, Leland [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained transformer models have achieved remarkable success in natural language processing (NLP) and have recently become competitive alternatives to Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) in vision and speech tasks, respectively. Due to their excellent computational efficiency and scalability, transformer models can be trained on exceedingly large amounts of data at the expense of tremendous growth in model size. As high performance, large-scale, and pre-trained transformer models become increasingly available for users to download and fine-tune for customized downstream tasks, their deployment becomes challenging due to the vast amount of operations and large memory footprint. To address this challenge, we introduce methods to deeply compress pre-trained transformer models across three major application domains: NLP, speech, and vision. Specifically, we quantize transformer backbones down to 4-bit and further achieve 50% fine-grained structural sparsity on pre-trained BERT, Wav2vec2.0, and Vision Transformer (ViT) models to demonstrate 16x compression while maintaining model accuracy. This is achieved by identifying critical initialization strategies for quantization- and sparsity- aware fine-tuning as well as developing novel techniques such as quantizers with a zero-preserving format and scheduled dropout. These hardware-friendly techniques need only to be applied in the fine-tuning phase for downstream tasks, which renders them especially suitable for acceleration and deployment of pre-trained transformer models.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] An Ensemble Voting Method of Pre-Trained Deep Learning Models for Orchid Recognition
    Ou, Chia-Ho
    Hu, Yi-Nuo
    Jiang, Dong-Jie
    Liao, Po-Yen
    2023 IEEE INTERNATIONAL SYSTEMS CONFERENCE, SYSCON, 2023,
  • [42] Recover Fair Deep Classification Models via Altering Pre-trained Structure
    Zhang, Yanfu
    Gao, Shangqian
    Huang, Heng
    COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 481 - 498
  • [43] Analysis of Layer Efficiency and Layer Reduction on Pre-trained Deep Learning Models
    Nugraha, Brilian Tafjira
    Su, Shun-Feng
    2018 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2018,
  • [44] Kurdish Sign Language Recognition Using Pre-Trained Deep Learning Models
    Alsaud, Ali A.
    Yousif, Raghad Z.
    Aziz, Marwan. M.
    Kareem, Shahab W.
    Maho, Amer J.
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (06) : 1334 - 1344
  • [45] Adversarial Attacks on Pre-trained Deep Learning Models for Encrypted Traffic Analysis
    Seok, Byoungjin
    Sohn, Kiwook
    JOURNAL OF WEB ENGINEERING, 2024, 23 (06): : 749 - 768
  • [46] Integration of pre-trained protein language models into geometric deep learning networks
    Wu, Fang
    Wu, Lirong
    Radev, Dragomir
    Xu, Jinbo
    Li, Stan Z.
    COMMUNICATIONS BIOLOGY, 2023, 6 (01)
  • [47] A Performance Comparison of Pre-trained Deep Learning Models to Classify Brain Tumor
    Diker, Aykut
    IEEE EUROCON 2021 - 19TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES, 2021, : 246 - 249
  • [48] Integration of pre-trained protein language models into geometric deep learning networks
    Fang Wu
    Lirong Wu
    Dragomir Radev
    Jinbo Xu
    Stan Z. Li
    Communications Biology, 6
  • [49] Classification and Analysis of Agaricus bisporus Diseases with Pre-Trained Deep Learning Models
    Albayrak, Umit
    Golcuk, Adem
    Aktas, Sinan
    Coruh, Ugur
    Tasdemir, Sakir
    Baykan, Omer Kaan
    AGRONOMY-BASEL, 2025, 15 (01):
  • [50] Enhancement of Pre-Trained Deep Learning Models to Improve Brain Tumor Classification
    Ullah Z.
    Odeh A.
    Khattak I.
    Hasan M.A.
    Informatica (Slovenia), 2023, 47 (06): : 165 - 172