Bridging the Gap between Pre-Training and Fine-Tuning for Commonsense Generation

被引:0
|
作者
Yang, Haoran [1 ]
Wang, Yan [2 ]
Li, Piji [2 ]
Bi, Wei [2 ]
Lam, Wai [1 ]
Xu, Chen [3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
[3] Beijing Univ Technol, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commonsense generation aims to generate a plausible sentence containing all given unordered concept words. Previous methods focusing on this task usually directly concatenate these words as the input of a pre-trained language model (PLM). However, in PLMs' pre-training process, the inputs are often corrupted sentences with correct word order. This input distribution discrepancy between pre-training and fine-tuning makes the model difficult to fully utilize the knowledge of PLMs. In this paper, we propose a two-stage framework to alleviate this issue. Firstly, in pre-training stage, we design a new format of input to endow PLMs the ability to deal with masked sentences with incorrect word order. Secondly, during fine-tuning, we insert the special token [MASK] between two consecutive concept words to make the input distribution more similar to the input distribution in pre-training. We conduct extensive experiments and provide a thorough analysis to demonstrate the effectiveness of our proposed method. The code is available at https://github.com/LHRYANG/CommonGen.
引用
收藏
页码:376 / 383
页数:8
相关论文
共 50 条
  • [31] CODE: Contrastive Pre-training with Adversarial Fine-Tuning for Zero-Shot Expert Linking
    Chen, Bo
    Zhang, Jing
    Zhang, Xiaokang
    Tang, Xiaobin
    Cai, Lingfan
    Chen, Hong
    Li, Cuiping
    Zhang, Peng
    Tang, Jie
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11846 - 11854
  • [32] Trajectory-BERT: Pre-training and fine-tuning bidirectional transformers for crowd trajectory enhancement
    Li, Lingyu
    Huang, Tianyu
    Li, Yihao
    Li, Peng
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [33] Editorial for Special Issue on Large-scale Pre-training: Data, Models, and Fine-tuning
    Wen, Ji-Rong
    Huang, Zi
    Zhang, Hanwang
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 145 - 146
  • [34] Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition
    Wang, Guanhong
    Zhou, Yang
    He, Zhanhao
    Lu, Keyu
    Feng, Yang
    Liu, Zuozhu
    Wang, Gaoang
    NEUROCOMPUTING, 2024, 571
  • [35] Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
    Nakamoto, Mitsuhiko
    Zhai, Yuexiang
    Singh, Anikait
    Mark, Max Sobol
    Ma, Yi
    Finn, Chelsea
    Kumar, Aviral
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design Paradigm
    Lyu, Zhonghao
    Li, Yuchen
    Zhu, Guangxu
    Xu, Jie
    Poor, H. Vincent
    Cui, Shuguang
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (02) : 1584 - 1601
  • [37] Empower Post-hoc Graph Explanations with Information Bottleneck: A Pre-training and Fine-tuning Perspective
    Wang, Jihong
    Luo, Minnan
    Li, Jundong
    Lin, Yun
    Dong, Yushun
    Dong, Jin Song
    Zheng, Qinghua
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2349 - 2360
  • [38] Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering
    Addlesee, Angus
    Sieinska, Weronika
    Gunson, Nancie
    Garcia, Daniel Hernandez
    Dondrup, Christian
    Lemon, Oliver
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 229 - 241
  • [39] Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering
    Soni, Sarvesh
    Roberts, Kirk
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5532 - 5538
  • [40] From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain
    Bonfigli, Agnese
    Bacco, Luca
    Merone, Mario
    Dell'Orletta, Felice
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 157