Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation

被引：0

作者：

Behnke, Maximiliana ^{[1
]}

Heafield, Kenneth ^{[1
]}

机构：

[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The attention mechanism is the crucial component of the transformer architecture. Recent research shows that most attention heads are not confident in their decisions and can be pruned after training. However, removing them before training a model results in lower quality. In this paper, we apply the lottery ticket hypothesis to prune heads in the early stages of training, instead of doing so on a fully converged model. Our experiments on machine translation show that it is possible to remove up to three-quarters of all attention heads from a transformer-big model with an average -0.1 change in BLEU for Turkish -> English. The pruned model is 1.5 times as fast at inference, albeit at the cost of longer training. The method is complementary to other approaches, such as teacher-student, with our English!German student losing 0.2 BLEU at 75% encoder attention sparsity.

引用

页码：2664 / 2674

页数：11

共 50 条

[21] Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation
Gu, Shuhao
Feng, Yang
Xie, Wanying
[J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3942 - 3952
[22] Neural Machine Translation with Target-Attention Model
Yang, Mingming
Zhang, Min
Chen, Kehai
Wang, Rui
Zhao, Tiejun
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (03) : 684 - 694
[23] Attention With Sparsity Regularization for Neural Machine Translation and Summarization
Zhang, Jiajun
Zhao, Yang
Li, Haoran
Zong, Chengqing
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 507 - 518
[24] Syntax-Directed Attention for Neural Machine Translation
Chen, Kehai
Wang, Rui
Utiyama, Masao
Sumita, Eiichiro
Zhao, Tiejun
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4792 - 4799
[25] Dynamic Attention Aggregation with BERT for Neural Machine Translation
Zhang, JiaRui
Li, HongZheng
Shi, ShuMin
Huang, HeYan
Hu, Yue
Wei, XiangPeng
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[26] Attention based English to Punjabi neural machine translation
Singh, Shivkaran
Kumar, M. Anand
Soman, K. P.
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1551 - 1559
[27] Simultaneous neural machine translation with a reinforced attention mechanism
Lee, YoHan
Shin, JongHun
Kim, YoungKil
[J]. ETRI JOURNAL, 2021, 43 (05) : 775 - 786
[28] Measuring and Improving Faithfulness of Attention in Neural Machine Translation
Moradi, Pooya
Kambhatla, Nishant
Sarkar, Anoop
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2791 - 2802
[29] Beyond the Transformer: A Novel Polynomial Inherent Attention (PIA) Model and Its Great Impact on Neural Machine Translation
ELAffendi, Mohammed
Alrajhi, Khawlah
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[30] Syntax-aware Transformer Encoder for Neural Machine Translation
Duan, Sufeng
Zhao, Hai
Zhou, Junru
Wang, Rui
[J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 396 - 401

← 1 2 3 4 5 →