Improving BERT With Self-Supervised Attention

被引：1

作者：

Chen, Yiren ^{[1
]}

Kou, Xiaoyu ^{[1
]}

Bai, Jiangang ^{[1
]}

Tong, Yunhai ^{[1
]}

机构：

[1] Peking Univ, Sch Elect Engn & Comp Sci, Key Lab Machine Percept MOE, Beijing 100871, Peoples R China

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Task analysis; Bit error rate; Predictive models; Data models; Training; Training data; Licenses; Natural language processing; attention model; text classification; BERT; pre-trained model;

D O I：

10.1109/ACCESS.2021.3122273

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of these fine-tuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by probing the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, through a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model.

引用

页码：144129 / 144139

页数：11

共 50 条

[1] Improving novelty detection by self-supervised learning and channel attention mechanism
Tian, Miao
Cui, Ying
Long, Haixia
Li, Junxia
[J]. INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2021, 48 (05): : 673 - 679
[2] Improving novelty detection by self-supervised learning and channel attention mechanism
Tian, Miao
Cui, Ying
Long, Haixia
Li, Junxia
[J]. Industrial Robot, 2021, 48 (05): : 673 - 679
[3] Self-supervised Attention Learning for Robot Control
Cong, Lin
Shi, Yunlei
Zhang, Jianwei
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 1153 - 1158
[4] Guiding Attention for Self-Supervised Learning with Transformers
Deshpande, Ameet
Narasimhan, Karthik
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4676 - 4686
[5] AUDIO ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF AUDIO REPRESENTATION
Chi, Po-Han
Chung, Pei-Hung
Wu, Tsung-Han
Hsieh, Chun-Cheng
Chen, Yen-Hao
Li, Shang-Wen
Lee, Hung-yi
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 344 - 350
[6] Understanding Self-Attention of Self-Supervised Audio Transformers
Yang, Shu-wen
Liu, Andy T.
Lee, Hung-yi
[J]. INTERSPEECH 2020, 2020, : 3785 - 3789
[7] Reinforcement Learning with Attention that Works: A Self-Supervised Approach
Manchin, Anthony
Abbasnejad, Ehsan
van den Hengel, Anton
[J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 223 - 230
[8] Heuristic Attention Representation Learning for Self-Supervised Pretraining
Van Nhiem Tran
Liu, Shen-Hsuan
Li, Yung-Hui
Wang, Jia-Ching
[J]. SENSORS, 2022, 22 (14)
[9] Graph Multihead Attention Pooling with Self-Supervised Learning
Wang, Yu
Hu, Liang
Wu, Yang
Gao, Wanfu
[J]. ENTROPY, 2022, 24 (12)
[10] Self-supervised recurrent depth estimation with attention mechanisms
Makarov, Ilya
Bakhanova, Maria
Nikolenko, Sergey
Gerasimova, Olga
[J]. PeerJ Computer Science, 2022, 8

← 1 2 3 4 5 →