Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

被引：0

作者：

Zheng, Junhao ^{[1
]}

Ma, Qianli ^{[1
]}

Qiu, Shengjie ^{[1
]}

Wu, Yue ^{[1
]}

Ma, Peitian ^{[1
]}

Liu, Junlong ^{[1
]}

Feng, Huawen ^{[1
]}

Shang, Xichen ^{[1
]}

Chen, Haibin ^{[1
]}

机构：

[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

and effective technique to transfer the learned knowledge of Pre-trained Language Models (PLMs) to downstream tasks. However, vanilla fine-tuning easily overfits the target data and degrades the generalization ability. Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately without identifying what knowledge is transferable. Motivated by this, we frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pre-trained data. Based on the causal view, we propose a unified objective for fine-tuning to retrieve the causality back. Intriguingly, the unified objective can be seen as the sum of the vanilla fine-tuning objective, which learns new knowledge from target data, and the causal objective, which preserves old knowledge from PLMs. Therefore, our method is flexible and can mitigate negative transfer while preserving knowledge. Since endowing models with common-sense is a long-standing challenge, we implement our method on commonsense QA with a proposed heuristic estimation to verify its effectiveness. In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets and can be implemented as a plug-in module to inflate the performance of existing QA models.

引用

页码：9155 / 9173

页数：19

共 50 条

[21] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
[J]. ENGINEERING, 2023, 25 : 51 - 65
[22] Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models
Ma, Kaixin
Ilievski, Filip
Francis, Jonathan
Ozaki, Satoru
Nyberg, Eric
Oltramari, Alessandro
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5474 - 5483
[23] On the Sentence Embeddings from Pre-trained Language Models
Li, Bohan
Zhou, Hao
He, Junxian
Wang, Mingxuan
Yang, Yiming
Li, Lei
[J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130
[24] CanWe Utilize Pre-trained Language Models within Causal Discovery Algorithms?
Lee, Chanhui
Kim, Juhyeon
Jeong, Yongjun
Lyu, Juhyun
Kim, Junghee
Lee, Sangmin
Han, Sangjun
Choe, Hyeokjun
Park, Soyeon
Lim, Woohyung
Lim, Sungbin
Lee, Sanghack
[J]. arXiv, 2023,
[25] Pre-trained language models with domain knowledge for biomedical extractive summarization
Xie, Qianqian
Bishop, Jennifer Amy
Tiwari, Prayag
Ananiadou, Sophia
[J]. Knowledge-Based Systems, 2022, 252
[26] IMPROVING CTC-BASED SPEECH RECOGNITION VIA KNOWLEDGE TRANSFERRING FROM PRE-TRAINED LANGUAGE MODELS
Deng, Keqi
Cao, Songjun
Zhang, Yike
Ma, Long
Cheng, Gaofeng
Xu, Ji
Zhang, Pengyuan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8517 - 8521
[27] Interpretable Biomedical Reasoning via Deep Fusion of Knowledge Graph and Pre-trained Language Models
Xu, Yinxin
Yang, Zongbao
Lin, Yuchen
Hu, Jinlong
Dong, Shoubin
[J]. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 62 - 70
[28] Plug-and-Play Knowledge Injection for Pre-trained Language Models
Zhang, Zhengyan
Zeng, Zhiyuan
Lin, Yankai
Wang, Huadong
Ye, Deming
Xiao, Chaojun
Han, Xu
Liu, Zhiyuan
Li, Peng
Sun, Maosong
Zhou, Jie
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10641 - 10656
[29] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
Xu, Weiwen
Li, Xin
Zhang, Wenxuan
Zhou, Meng
Lam, Wai
Si, Luo
Bing, Lidong
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[30] Error Investigation of Pre-trained BERTology Models on Vietnamese Natural Language Inference
Tin Van Huynh
Huy Quoc To
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
[J]. RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 176 - 188

← 1 2 3 4 5 →