Improving BERT With Self-Supervised Attention

被引:1
|
作者
Chen, Yiren [1 ]
Kou, Xiaoyu [1 ]
Bai, Jiangang [1 ]
Tong, Yunhai [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Key Lab Machine Percept MOE, Beijing 100871, Peoples R China
关键词
Task analysis; Bit error rate; Predictive models; Data models; Training; Training data; Licenses; Natural language processing; attention model; text classification; BERT; pre-trained model;
D O I
10.1109/ACCESS.2021.3122273
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of these fine-tuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by probing the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, through a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model.
引用
收藏
页码:144129 / 144139
页数:11
相关论文
共 50 条
  • [1] Improving novelty detection by self-supervised learning and channel attention mechanism
    Tian, Miao
    Cui, Ying
    Long, Haixia
    Li, Junxia
    [J]. INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2021, 48 (05): : 673 - 679
  • [2] Improving novelty detection by self-supervised learning and channel attention mechanism
    Tian, Miao
    Cui, Ying
    Long, Haixia
    Li, Junxia
    [J]. Industrial Robot, 2021, 48 (05): : 673 - 679
  • [3] Self-supervised Attention Learning for Robot Control
    Cong, Lin
    Shi, Yunlei
    Zhang, Jianwei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 1153 - 1158
  • [4] Guiding Attention for Self-Supervised Learning with Transformers
    Deshpande, Ameet
    Narasimhan, Karthik
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4676 - 4686
  • [5] AUDIO ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF AUDIO REPRESENTATION
    Chi, Po-Han
    Chung, Pei-Hung
    Wu, Tsung-Han
    Hsieh, Chun-Cheng
    Chen, Yen-Hao
    Li, Shang-Wen
    Lee, Hung-yi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 344 - 350
  • [6] Understanding Self-Attention of Self-Supervised Audio Transformers
    Yang, Shu-wen
    Liu, Andy T.
    Lee, Hung-yi
    [J]. INTERSPEECH 2020, 2020, : 3785 - 3789
  • [7] Reinforcement Learning with Attention that Works: A Self-Supervised Approach
    Manchin, Anthony
    Abbasnejad, Ehsan
    van den Hengel, Anton
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 223 - 230
  • [8] Heuristic Attention Representation Learning for Self-Supervised Pretraining
    Van Nhiem Tran
    Liu, Shen-Hsuan
    Li, Yung-Hui
    Wang, Jia-Ching
    [J]. SENSORS, 2022, 22 (14)
  • [9] Graph Multihead Attention Pooling with Self-Supervised Learning
    Wang, Yu
    Hu, Liang
    Wu, Yang
    Gao, Wanfu
    [J]. ENTROPY, 2022, 24 (12)
  • [10] Self-supervised recurrent depth estimation with attention mechanisms
    Makarov, Ilya
    Bakhanova, Maria
    Nikolenko, Sergey
    Gerasimova, Olga
    [J]. PeerJ Computer Science, 2022, 8