Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

被引：1

作者：

Shim, Kyuhong ^{[1
]}

Choi, Iksoo ^{[1
]}

Sung, Wonyong ^{[1
]}

Choi, Jungwook ^{[2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea

[2] Hanyang Univ, Dept Elect Engn, Seoul, South Korea

来源：

18TH INTERNATIONAL SOC DESIGN CONFERENCE 2021 (ISOCC 2021) | 2021年

基金：

新加坡国家研究基金会;

关键词：

pruning; transformer; multihead attention;

D O I：

10.1109/ISOCC53507.2021.9613933

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, the necessity of multiple attention heads in transformer architecture has been questioned [1]. Removing less important heads from a large network is a promising strategy to reduce computation cost and parameters. However, pruning out attention heads in multihead attention does not evenly reduce the overall load, because feedforward modules are not affected. In this study, we apply attention head pruning on All-attention [2] transformer, where savings in the computation are proportional to the number of pruned heads. This improved computing efficiency comes at the cost of pruning sensitivity, which we stabilize with three training techniques. Our attention head pruning enables a considerably fewer number of parameters with a comparable perplexity for transformer-based language modeling.

引用

页码：357 / 358

页数：2

共 50 条

[1] Learning to Search Efficient DenseNet with Layer-wise Pruning
Zhang, Xuanyang
Liu, Hao
Zhu, Zhanxing
Xu, Zenglin
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[2] Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error
Jiang, Chunhui
Li, Guiying
Qian, Chao
Tang, Ke
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2298 - 2304
[3] Transkimmer: Transformer Learns to Layer-wise Skim
Guan, Yue
Li, Zhengyi
Leng, Jingwen
Lin, Zhouhan
Guo, Minyi
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7275 - 7286
[4] Layer-wise Model Pruning based on Mutual Information
Fan, Chun
Li, Jiwei
Ao, Xiang
Wu, Fei
Meng, Yuxian
Sun, Xiaofei
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3079 - 3090
[5] FedLP: Layer-wise Pruning Mechanism for Communication-Computation Efficient Federated Learning
Zhu, Zheqi
Shi, Yuchen
Luo, Jiajun
Wang, Fei
Peng, Chenghui
Fan, Pingyi
Letaief, Khaled B.
[J]. ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1250 - 1255
[6] A Layer-wise Training and Pruning Method for Memory Efficient On-chip Learning Hardware
Lew, Dongwoo
Park, Jongsun
[J]. 2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 97 - 98
[7] ONE-SHOT LAYER-WISE ACCURACY APPROXIMATION FOR LAYER PRUNING
Elkerdawy, Sara
Elhoushi, Mostafa
Singh, Abhineet
Zhang, Hong
Ray, Nilanjan
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2940 - 2944
[8] Towards Efficient Federated Learning: Layer-Wise Pruning-Quantization Scheme and Coding Design
Zhu, Zheqi
Shi, Yuchen
Xin, Gangtao
Peng, Chenghui
Fan, Pingyi
Letaief, Khaled B.
[J]. ENTROPY, 2023, 25 (08)
[9] Pruning Ratio Optimization with Layer-Wise Pruning Method for Accelerating Convolutional Neural Networks
Kamma, Koji
Inoue, Sarimu
Wada, Toshikazu
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (01) : 161 - 169
[10] Layer-Wise External Attention by Well-Localized Attention Map for Efficient Deep Anomaly Detection
Keiichi Nakanishi
Ryo Shiroma
Tokihisa Hayakawa
Ryoya Katafuchi
Terumasa Tokunaga
[J]. SN Computer Science, 5 (5)

← 1 2 3 4 5 →