Tree Transformer: Integrating Tree Structures into Self-Attention

被引:0
|
作者
Wang, Yau-Shian [1 ]
Lee, Hung-Yi [1 ]
Chen, Yun-Nung [1 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores(1).
引用
收藏
页码:1061 / 1070
页数:10
相关论文
共 50 条
  • [41] ENHANCING TONGUE REGION SEGMENTATION THROUGH SELF-ATTENTION AND TRANSFORMER BASED
    Song, Yihua
    Li, Can
    Zhang, Xia
    Liu, Zhen
    Song, Ningning
    Zhou, Zuojian
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2024, 24 (02)
  • [42] EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention
    Shi, Yulong
    Sun, Mingwei
    Wang, Yongshuai
    Ma, Jiahao
    Chen, Zengqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2025, 55 (03) : 1288 - 1300
  • [43] Re-Transformer: A Self-Attention Based Model for Machine Translation
    Liu, Huey-Ing
    Chen, Wei-Lin
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10
  • [44] Wavelet Frequency Division Self-Attention Transformer Image Deraining Network
    Fang, Siyan
    Liu, Bin
    Computer Engineering and Applications, 2024, 60 (06) : 259 - 273
  • [45] MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
    Wang, Rui
    Ao, Junyi
    Zhou, Long
    Liu, Shujie
    Wei, Zhihua
    Ko, Tom
    Li, Qing
    Zhang, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6732 - 6736
  • [46] Bottleneck Transformer model with Channel Self-Attention for skin lesion classification
    Tada, Masato
    Han, Xian-Hua
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [47] A self-attention armed optronic transformer in imaging through scattering media
    Huang, Zicheng
    Shi, Mengyang
    Ma, Jiahui
    Gao, Yesheng
    Liu, Xingzhao
    OPTICS COMMUNICATIONS, 2024, 571
  • [48] CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION
    Wakayama, Keigo
    Saito, Shoichiro
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 806 - 810
  • [49] Global-Local Self-Attention Based Transformer for Speaker Verification
    Xie, Fei
    Zhang, Dalong
    Liu, Chengming
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [50] EEG-Transformer: Self-attention from Transformer Architecture for Decoding EEG of Imagined Speech
    Lee, Young-Eun
    Lee, Seo-Hyun
    10TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI2022), 2022,