Tree Transformer: Integrating Tree Structures into Self-Attention

被引：0

作者：

Wang, Yau-Shian ^{[1
]}

Lee, Hung-Yi ^{[1
]}

Chen, Yun-Nung ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei, Taiwan

来源：

2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores(1).

引用

页码：1061 / 1070

页数：10

共 50 条

[1] INTEGRATING DEPENDENCY TREE INTO SELF-ATTENTION FOR SENTENCE REPRESENTATION
Ma, Junhua
Li, Jiajun
Liu, Yuxuan
Zhou, Shangbo
Li, Xue
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8137 - 8141
[2] Self-attention binary neural tree for video summarization
Fu, Hao
Wang, Hongxing
PATTERN RECOGNITION LETTERS, 2021, 143 : 19 - 26
[3] Self-attention binary neural tree for video summarization
Fu, Hao
Wang, Hongxing
Wang, Hongxing (ihxwang@cqu.edu.cn), 1600, Elsevier B.V. (143): : 19 - 26
[4] Recursive Tree-Structured Self-Attention for Answer Sentence Selection
Mrini, Khalil
Farcas, Emilia
Nakashole, Ndapa
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4651 - 4661
[5] Relative molecule self-attention transformer
Łukasz Maziarka
Dawid Majchrowski
Tomasz Danel
Piotr Gaiński
Jacek Tabor
Igor Podolak
Paweł Morkisz
Stanisław Jastrzębski
Journal of Cheminformatics, 16
[6] Relative molecule self-attention transformer
Maziarka, Lukasz
Majchrowski, Dawid
Danel, Tomasz
Gainski, Piotr
Tabor, Jacek
Podolak, Igor
Morkisz, Pawel
Jastrzebski, Stanislaw
JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
[7] Texture-Aware Self-Attention Model for Hyperspectral Tree Species Classification
Li, Nanying
Jiang, Shuguo
Xue, Jiaqi
Ye, Songxin
Jia, Sen
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[8] Universal Graph Transformer Self-Attention Networks
Dai Quoc Nguyen
Tu Dinh Nguyen
Dinh Phung
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 193 - 196
[9] Sparse self-attention transformer for image inpainting
Huang, Wenli
Deng, Ye
Hui, Siqi
Wu, Yang
Zhou, Sanping
Wang, Jinjun
PATTERN RECOGNITION, 2024, 145
[10] SST: self-attention transformer for infrared deconvolution
Gao, Lei
Yan, Xiaohong
Deng, Lizhen
Xu, Guoxia
Zhu, Hu
INFRARED PHYSICS & TECHNOLOGY, 2024, 140

← 1 2 3 4 5 →