Enhancing performance of transformer-based models in natural language understanding through word importance embedding

被引:1
|
作者
Hong, Seung-Kyu [1 ]
Jang, Jae-Seok [1 ]
Kwon, Hyuk-Yoon [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Grad Sch Data Sci, 232 Gongneung Ro, Seoul 01811, South Korea
关键词
Natural language understanding; Transformer; Word importance; Word dependency;
D O I
10.1016/j.knosys.2024.112404
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based models have achieved state-of-the-art performance on natural language understanding (NLU) tasks by learning important token relationships through the attention mechanism. However, we observe that attention can become overly distributed during fine-tuning, failing to preserve the dependencies between meaningful tokens adequately. This phenomenon negatively affects the learning of token relationships in sentences. To overcome this issue, we propose a methodology that embeds the feature of word importance (WI) in the transformer-based models as a new layer, weighting the words according to their importance. Our simple yet powerful approach offers a general technique to boost transformer model capabilities on NLU tasks by mitigating the risk of attention dispersion during fine-tuning. Through extensive experiments on GLUE, SuperGLUE, and SQuAD benchmarks for pre-trained models (BERT, RoBERTa, ELECTRA, and DeBERTa), and MMLU, Big Bench Hard, and DROP benchmarks for the large language model, Llama2, we validate the effectiveness of our method in consistently enhancing performance across models with negligible overhead. Furthermore, we validate that our WI layer better preserves the dependencies between important tokens than standard fine-tuning by introducing a model classifying dependent tokens from the learned attention weights. The code is available at https://github.com/bigbases/WordImportance.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
  • [2] Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
    Aspillaga, Carlos
    Carvallo, Andres
    Araujo, Vladimir
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1882 - 1894
  • [3] Enhancing Address Data Integrity using Transformer-Based Language Models
    Kurklu, Omer Faruk
    Akagiunduz, Erdem
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [4] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Feihong Yang
    Xuwen Wang
    Hetong Ma
    Jiao Li
    BMC Medical Informatics and Decision Making, 21
  • [5] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Yang, Feihong
    Wang, Xuwen
    Ma, Hetong
    Li, Jiao
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 2)
  • [6] Scaling Implicit Bias Analysis across Transformer-Based Language Models through Embedding Association Test and Prompt Engineering
    Bevara, Ravi Varma Kumar
    Mannuru, Nishith Reddy
    Karedla, Sai Pranathi
    Xiao, Ting
    APPLIED SCIENCES-BASEL, 2024, 14 (08):
  • [7] Enhancing Credit Risk Assessment Through Transformer-Based Machine Learning Models
    Siphuma, Elekanyani
    van Zyl, Terence
    ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2024, 2025, 2326 : 124 - 143
  • [8] Transformer-Based Word Embedding With CNN Model to Detect Sarcasm and Irony
    Ravinder Ahuja
    S. C. Sharma
    Arabian Journal for Science and Engineering, 2022, 47 : 9379 - 9392
  • [9] Transformer-Based Word Embedding With CNN Model to Detect Sarcasm and Irony
    Ahuja, Ravinder
    Sharma, S. C.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (08) : 9379 - 9392
  • [10] BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks
    Oh, Jong-Hoon
    Iida, Ryu
    Kloetzer, Julien
    Torisawa, Kentaro
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2103 - 2115