Tree Transformer: Integrating Tree Structures into Self-Attention

被引:0
|
作者
Wang, Yau-Shian [1 ]
Lee, Hung-Yi [1 ]
Chen, Yun-Nung [1 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores(1).
引用
收藏
页码:1061 / 1070
页数:10
相关论文
共 50 条
  • [21] Efficient memristor accelerator for transformer self-attention functionality
    Bettayeb, Meriem
    Halawani, Yasmin
    Khan, Muhammad Umair
    Saleh, Hani
    Mohammad, Baker
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [22] A lightweight transformer with linear self-attention for defect recognition
    Zhai, Yuwen
    Li, Xinyu
    Gao, Liang
    Gao, Yiping
    ELECTRONICS LETTERS, 2024, 60 (17)
  • [23] Transformer with sparse self-attention mechanism for image captioning
    Wang, Duofeng
    Hu, Haifeng
    Chen, Dihu
    ELECTRONICS LETTERS, 2020, 56 (15) : 764 - +
  • [24] An efficient parallel self-attention transformer for CSI feedback
    Liu, Ziang
    Song, Tianyu
    Zhao, Ruohan
    Jin, Jiyu
    Jin, Guiyue
    PHYSICAL COMMUNICATION, 2024, 66
  • [25] Transformer Self-Attention Network for Forecasting Mortality Rates
    Roshani, Amin
    Izadi, Muhyiddin
    Khaledi, Baha-Eldin
    JIRSS-JOURNAL OF THE IRANIAN STATISTICAL SOCIETY, 2022, 21 (01): : 81 - 103
  • [26] Keyword Transformer: A Self-Attention Model for Keyword Spotting
    Berg, Axel
    O'Connor, Mark
    Cruz, Miguel Tairum
    INTERSPEECH 2021, 2021, : 4249 - 4253
  • [27] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
    Leem, Saebom
    Seo, Hyunseok
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964
  • [28] Decomformer: Decompose Self-Attention of Transformer for Efficient Image Restoration
    Lee, Eunho
    Hwang, Youngbae
    IEEE ACCESS, 2024, 12 : 38672 - 38684
  • [29] Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
    Hao, Yaru
    Dong, Li
    Wei, Furu
    Xu, Ke
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12963 - 12971
  • [30] RSAFormer: A method of polyp segmentation with region self-attention transformer
    Yin X.
    Zeng J.
    Hou T.
    Tang C.
    Gan C.
    Jain D.K.
    García S.
    Computers in Biology and Medicine, 2024, 172