Tree Transformer: Integrating Tree Structures into Self-Attention

被引：0

作者：

Wang, Yau-Shian ^{[1
]}

Lee, Hung-Yi ^{[1
]}

Chen, Yun-Nung ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei, Taiwan

来源：

2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores(1).

引用

页码：1061 / 1070

页数：10

共 50 条

[21] Efficient memristor accelerator for transformer self-attention functionality
Bettayeb, Meriem
Halawani, Yasmin
Khan, Muhammad Umair
Saleh, Hani
Mohammad, Baker
SCIENTIFIC REPORTS, 2024, 14 (01):
[22] A lightweight transformer with linear self-attention for defect recognition
Zhai, Yuwen
Li, Xinyu
Gao, Liang
Gao, Yiping
ELECTRONICS LETTERS, 2024, 60 (17)
[23] Transformer with sparse self-attention mechanism for image captioning
Wang, Duofeng
Hu, Haifeng
Chen, Dihu
ELECTRONICS LETTERS, 2020, 56 (15) : 764 - +
[24] An efficient parallel self-attention transformer for CSI feedback
Liu, Ziang
Song, Tianyu
Zhao, Ruohan
Jin, Jiyu
Jin, Guiyue
PHYSICAL COMMUNICATION, 2024, 66
[25] Transformer Self-Attention Network for Forecasting Mortality Rates
Roshani, Amin
Izadi, Muhyiddin
Khaledi, Baha-Eldin
JIRSS-JOURNAL OF THE IRANIAN STATISTICAL SOCIETY, 2022, 21 (01): : 81 - 103
[26] Keyword Transformer: A Self-Attention Model for Keyword Spotting
Berg, Axel
O'Connor, Mark
Cruz, Miguel Tairum
INTERSPEECH 2021, 2021, : 4249 - 4253
[27] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Leem, Saebom
Seo, Hyunseok
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964
[28] Decomformer: Decompose Self-Attention of Transformer for Efficient Image Restoration
Lee, Eunho
Hwang, Youngbae
IEEE ACCESS, 2024, 12 : 38672 - 38684
[29] Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
Hao, Yaru
Dong, Li
Wei, Furu
Xu, Ke
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12963 - 12971
[30] RSAFormer: A method of polyp segmentation with region self-attention transformer
Yin X.
Zeng J.
Hou T.
Tang C.
Gan C.
Jain D.K.
García S.
Computers in Biology and Medicine, 2024, 172

← 1 2 3 4 5 →