Towards Understanding Neural Machine Translation with Attention Heads' Importance

被引:0
|
作者
Zhou, Zijie [1 ,2 ]
Zhu, Junguo [1 ,2 ]
Li, Weijiang [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Prov Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 07期
基金
中国国家自然科学基金;
关键词
neural machine translation; interpretability; linguistics;
D O I
10.3390/app14072798
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Although neural machine translation has made great progress, and the Transformer has advanced the state-of-the-art in various language pairs, the decision-making process of the attention mechanism, a crucial component of the Transformer, remains unclear. In this paper, we propose to understand the model's decisions by the attention heads' importance. We explore the knowledge acquired by the attention heads, elucidating the decision-making process through the lens of linguistic understanding. Specifically, we quantify the importance of each attention head by assessing its contribution to neural machine translation performance, employing a Masking Attention Heads approach. We evaluate the method and investigate the distribution of attention heads' importance, as well as its correlation with part-of-speech contribution. To understand the diverse decisions made by attention heads, we concentrate on analyzing multi-granularity linguistic knowledge. Our findings indicate that specialized heads play a crucial role in learning linguistics. By retaining important attention heads and removing the unimportant ones, we can optimize the attention mechanism. This optimization leads to a reduction in the number of model parameters and an increase in the model's speed. Moreover, by leveraging the connection between attention heads and multi-granular linguistic knowledge, we can enhance the model's interpretability. Consequently, our research provides valuable insights for the design of improved NMT models.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Neural Machine Translation with Target-Attention Model
    Yang, Mingming
    Zhang, Min
    Chen, Kehai
    Wang, Rui
    Zhao, Tiejun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (03) : 684 - 694
  • [22] Syntax-Directed Attention for Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4792 - 4799
  • [23] Dynamic Attention Aggregation with BERT for Neural Machine Translation
    Zhang, JiaRui
    Li, HongZheng
    Shi, ShuMin
    Huang, HeYan
    Hu, Yue
    Wei, XiangPeng
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [24] Synchronous Syntactic Attention for Transformer Neural Machine Translation
    Deguchi, Hiroyuki
    Tamura, Akihiro
    Ninomiya, Takashi
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 348 - 355
  • [25] Attention based English to Punjabi neural machine translation
    Singh, Shivkaran
    Kumar, M. Anand
    Soman, K. P.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1551 - 1559
  • [26] Measuring and Improving Faithfulness of Attention in Neural Machine Translation
    Moradi, Pooya
    Kambhatla, Nishant
    Sarkar, Anoop
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2791 - 2802
  • [27] Simultaneous neural machine translation with a reinforced attention mechanism
    Lee, YoHan
    Shin, JongHun
    Kim, YoungKil
    [J]. ETRI JOURNAL, 2021, 43 (05) : 775 - 786
  • [28] Towards the implementation of an Attention-based Neural Machine Translation with artificial pronunciation for Nahuatl as a mobile application
    Bello Garcia, Sergio Khalil
    Sanchez Lucero, Eduardo
    Pedroza Mendez, Blanca Estela
    Hernandez Hernandez, Jose Crispin
    Bonilla Huerta, Edmundo
    Ramirez Cruz, Jose Federico
    [J]. 2020 8TH EDITION OF THE INTERNATIONAL CONFERENCE IN SOFTWARE ENGINEERING RESEARCH AND INNOVATION (CONISOFT 2020), 2020, : 235 - 244
  • [29] Understanding and Improving Hidden Representation for Neural Machine Translation
    Li, Guanlin
    Liu, Lemao
    Li, Xintong
    Zhu, Conghui
    Zhao, Tiejun
    Shi, Shuming
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 466 - 477
  • [30] Towards Modeling the Style of Translators in Neural Machine Translation
    Wang, Yue
    Hoang, Cuong
    Federico, Marcello
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1193 - 1199