Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

被引:0
|
作者
Zheng, Junhao [1 ]
Ma, Qianli [1 ]
Qiu, Shengjie [1 ]
Wu, Yue [1 ]
Ma, Peitian [1 ]
Liu, Junlong [1 ]
Feng, Huawen [1 ]
Shang, Xichen [1 ]
Chen, Haibin [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
and effective technique to transfer the learned knowledge of Pre-trained Language Models (PLMs) to downstream tasks. However, vanilla fine-tuning easily overfits the target data and degrades the generalization ability. Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately without identifying what knowledge is transferable. Motivated by this, we frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pre-trained data. Based on the causal view, we propose a unified objective for fine-tuning to retrieve the causality back. Intriguingly, the unified objective can be seen as the sum of the vanilla fine-tuning objective, which learns new knowledge from target data, and the causal objective, which preserves old knowledge from PLMs. Therefore, our method is flexible and can mitigate negative transfer while preserving knowledge. Since endowing models with common-sense is a long-standing challenge, we implement our method on commonsense QA with a proposed heuristic estimation to verify its effectiveness. In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets and can be implemented as a plug-in module to inflate the performance of existing QA models.
引用
收藏
页码:9155 / 9173
页数:19
相关论文
共 50 条
  • [21] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    [J]. ENGINEERING, 2023, 25 : 51 - 65
  • [22] Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models
    Ma, Kaixin
    Ilievski, Filip
    Francis, Jonathan
    Ozaki, Satoru
    Nyberg, Eric
    Oltramari, Alessandro
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5474 - 5483
  • [23] On the Sentence Embeddings from Pre-trained Language Models
    Li, Bohan
    Zhou, Hao
    He, Junxian
    Wang, Mingxuan
    Yang, Yiming
    Li, Lei
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130
  • [24] CanWe Utilize Pre-trained Language Models within Causal Discovery Algorithms?
    Lee, Chanhui
    Kim, Juhyeon
    Jeong, Yongjun
    Lyu, Juhyun
    Kim, Junghee
    Lee, Sangmin
    Han, Sangjun
    Choe, Hyeokjun
    Park, Soyeon
    Lim, Woohyung
    Lim, Sungbin
    Lee, Sanghack
    [J]. arXiv, 2023,
  • [25] Pre-trained language models with domain knowledge for biomedical extractive summarization
    Xie, Qianqian
    Bishop, Jennifer Amy
    Tiwari, Prayag
    Ananiadou, Sophia
    [J]. Knowledge-Based Systems, 2022, 252
  • [26] IMPROVING CTC-BASED SPEECH RECOGNITION VIA KNOWLEDGE TRANSFERRING FROM PRE-TRAINED LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    Cheng, Gaofeng
    Xu, Ji
    Zhang, Pengyuan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8517 - 8521
  • [27] Interpretable Biomedical Reasoning via Deep Fusion of Knowledge Graph and Pre-trained Language Models
    Xu, Yinxin
    Yang, Zongbao
    Lin, Yuchen
    Hu, Jinlong
    Dong, Shoubin
    [J]. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 62 - 70
  • [28] Plug-and-Play Knowledge Injection for Pre-trained Language Models
    Zhang, Zhengyan
    Zeng, Zhiyuan
    Lin, Yankai
    Wang, Huadong
    Ye, Deming
    Xiao, Chaojun
    Han, Xu
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10641 - 10656
  • [29] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
    Xu, Weiwen
    Li, Xin
    Zhang, Wenxuan
    Zhou, Meng
    Lam, Wai
    Si, Luo
    Bing, Lidong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] Error Investigation of Pre-trained BERTology Models on Vietnamese Natural Language Inference
    Tin Van Huynh
    Huy Quoc To
    Kiet Van Nguyen
    Ngan Luu-Thuy Nguyen
    [J]. RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 176 - 188