Tree Transformer: Integrating Tree Structures into Self-Attention

被引:0
|
作者
Wang, Yau-Shian [1 ]
Lee, Hung-Yi [1 ]
Chen, Yun-Nung [1 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores(1).
引用
收藏
页码:1061 / 1070
页数:10
相关论文
共 50 条
  • [1] INTEGRATING DEPENDENCY TREE INTO SELF-ATTENTION FOR SENTENCE REPRESENTATION
    Ma, Junhua
    Li, Jiajun
    Liu, Yuxuan
    Zhou, Shangbo
    Li, Xue
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8137 - 8141
  • [2] Self-attention binary neural tree for video summarization
    Fu, Hao
    Wang, Hongxing
    [J]. PATTERN RECOGNITION LETTERS, 2021, 143 : 19 - 26
  • [3] Self-attention binary neural tree for video summarization
    Fu, Hao
    Wang, Hongxing
    [J]. Wang, Hongxing (ihxwang@cqu.edu.cn), 1600, Elsevier B.V. (143): : 19 - 26
  • [4] Recursive Tree-Structured Self-Attention for Answer Sentence Selection
    Mrini, Khalil
    Farcas, Emilia
    Nakashole, Ndapa
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4651 - 4661
  • [5] Relative molecule self-attention transformer
    Łukasz Maziarka
    Dawid Majchrowski
    Tomasz Danel
    Piotr Gaiński
    Jacek Tabor
    Igor Podolak
    Paweł Morkisz
    Stanisław Jastrzębski
    [J]. Journal of Cheminformatics, 16
  • [6] Relative molecule self-attention transformer
    Maziarka, Lukasz
    Majchrowski, Dawid
    Danel, Tomasz
    Gainski, Piotr
    Tabor, Jacek
    Podolak, Igor
    Morkisz, Pawel
    Jastrzebski, Stanislaw
    [J]. JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [7] Texture-Aware Self-Attention Model for Hyperspectral Tree Species Classification
    Li, Nanying
    Jiang, Shuguo
    Xue, Jiaqi
    Ye, Songxin
    Jia, Sen
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [8] Universal Graph Transformer Self-Attention Networks
    Dai Quoc Nguyen
    Tu Dinh Nguyen
    Dinh Phung
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 193 - 196
  • [9] SST: self-attention transformer for infrared deconvolution
    Gao, Lei
    Yan, Xiaohong
    Deng, Lizhen
    Xu, Guoxia
    Zhu, Hu
    [J]. INFRARED PHYSICS & TECHNOLOGY, 2024, 140
  • [10] Sparse self-attention transformer for image inpainting
    Huang, Wenli
    Deng, Ye
    Hui, Siqi
    Wu, Yang
    Zhou, Sanping
    Wang, Jinjun
    [J]. PATTERN RECOGNITION, 2024, 145