Deep Compression of Pre-trained Transformer Models

被引：0

作者：

Wang, Naigang ^{[1
]}

Liu, Chi-Chun ^{[1
]}

Venkataramani, Swagath ^{[1
]}

Sen, Sanchari ^{[1
]}

Chen, Chia-Yu ^{[1
]}

El Maghraoui, Kaoutar ^{[1
]}

Srinivasan, Vijayalakshmi ^{[1
]}

Chang, Leland ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained transformer models have achieved remarkable success in natural language processing (NLP) and have recently become competitive alternatives to Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) in vision and speech tasks, respectively. Due to their excellent computational efficiency and scalability, transformer models can be trained on exceedingly large amounts of data at the expense of tremendous growth in model size. As high performance, large-scale, and pre-trained transformer models become increasingly available for users to download and fine-tune for customized downstream tasks, their deployment becomes challenging due to the vast amount of operations and large memory footprint. To address this challenge, we introduce methods to deeply compress pre-trained transformer models across three major application domains: NLP, speech, and vision. Specifically, we quantize transformer backbones down to 4-bit and further achieve 50% fine-grained structural sparsity on pre-trained BERT, Wav2vec2.0, and Vision Transformer (ViT) models to demonstrate 16x compression while maintaining model accuracy. This is achieved by identifying critical initialization strategies for quantization- and sparsity- aware fine-tuning as well as developing novel techniques such as quantizers with a zero-preserving format and scheduled dropout. These hardware-friendly techniques need only to be applied in the fine-tuning phase for downstream tasks, which renders them especially suitable for acceleration and deployment of pre-trained transformer models.

引用

页数：15

共 50 条

[41] An Ensemble Voting Method of Pre-Trained Deep Learning Models for Orchid Recognition
Ou, Chia-Ho
Hu, Yi-Nuo
Jiang, Dong-Jie
Liao, Po-Yen
2023 IEEE INTERNATIONAL SYSTEMS CONFERENCE, SYSCON, 2023,
[42] Recover Fair Deep Classification Models via Altering Pre-trained Structure
Zhang, Yanfu
Gao, Shangqian
Huang, Heng
COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 481 - 498
[43] Analysis of Layer Efficiency and Layer Reduction on Pre-trained Deep Learning Models
Nugraha, Brilian Tafjira
Su, Shun-Feng
2018 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2018,
[44] Kurdish Sign Language Recognition Using Pre-Trained Deep Learning Models
Alsaud, Ali A.
Yousif, Raghad Z.
Aziz, Marwan. M.
Kareem, Shahab W.
Maho, Amer J.
JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (06) : 1334 - 1344
[45] Adversarial Attacks on Pre-trained Deep Learning Models for Encrypted Traffic Analysis
Seok, Byoungjin
Sohn, Kiwook
JOURNAL OF WEB ENGINEERING, 2024, 23 (06): : 749 - 768
[46] Integration of pre-trained protein language models into geometric deep learning networks
Wu, Fang
Wu, Lirong
Radev, Dragomir
Xu, Jinbo
Li, Stan Z.
COMMUNICATIONS BIOLOGY, 2023, 6 (01)
[47] A Performance Comparison of Pre-trained Deep Learning Models to Classify Brain Tumor
Diker, Aykut
IEEE EUROCON 2021 - 19TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES, 2021, : 246 - 249
[48] Integration of pre-trained protein language models into geometric deep learning networks
Fang Wu
Lirong Wu
Dragomir Radev
Jinbo Xu
Stan Z. Li
Communications Biology, 6
[49] Classification and Analysis of Agaricus bisporus Diseases with Pre-Trained Deep Learning Models
Albayrak, Umit
Golcuk, Adem
Aktas, Sinan
Coruh, Ugur
Tasdemir, Sakir
Baykan, Omer Kaan
AGRONOMY-BASEL, 2025, 15 (01):
[50] Enhancement of Pre-Trained Deep Learning Models to Improve Brain Tumor Classification
Ullah Z.
Odeh A.
Khattak I.
Hasan M.A.
Informatica (Slovenia), 2023, 47 (06): : 165 - 172

← 1 2 3 4 5 →