Pre-training Language Model as a Multi-perspective Course Learner

被引:0
|
作者
Chen, Beiduo [1 ,2 ]
Huang, Shaohan [2 ]
Zhang, Zihan [2 ]
Guo, Wu [1 ]
Ling, Zhenhua [1 ]
Huang, Haizhen [2 ]
Wei, Furu [2 ]
Deng, Weiwei [2 ]
Zhang, Qi [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
[2] Microsoft Corp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ELECTRA (Clark et al., 2020), the generator-discriminator pre-training framework, has achieved impressive semantic construction capability among various downstream tasks. Despite the convincing performance, ELECTRA still faces the challenges of monotonous training and deficient interaction. Generator with only masked language modeling (MLM) leads to biased learning and label imbalance for discriminator, decreasing learning efficiency; no explicit feedback loop from discriminator to generator results in the chasm between these two components, underutilizing the course learning. In this study, a multi-perspective course learning (MCL) method is proposed to fetch a many degrees and visual angles for sample-efficient pre-training, and to fully leverage the relationship between generator and discriminator. Concretely, three self-supervision courses are designed to alleviate inherent flaws of MLM and balance the label in a multi-perspective way. Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision. Moreover, a course soups trial is conducted to solve the "tug-of-war" dynamics problem of MCL, evolving a stronger pre-trained model. Experimental results show that our method significantly improves ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on GLUE and SQuAD 2.0 benchmarks, and overshadows recent advanced ELECTRA-style models under the same settings. The pre-trained MCL model is available at https://huggingface.co/McmanusChen/MCL-base.
引用
收藏
页码:114 / 128
页数:15
相关论文
共 50 条
  • [31] Survey on Vision-language Pre-training
    Yin J.
    Zhang Z.-D.
    Gao Y.-H.
    Yang Z.-W.
    Li L.
    Xiao M.
    Sun Y.-Q.
    Yan C.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
  • [32] Pre-training Language Models for Comparative Reasoning
    Yu, Mengxia
    Zhang, Zhihan
    Yu, Wenhao
    Jiang, Meng
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12421 - 12433
  • [33] Sigmoid Loss for Language Image Pre-Training
    Zhai, Xiaohua
    Mustafa, Basil
    Kolesnikov, Alexander
    Beyer, Lucas
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11941 - 11952
  • [34] MVP: Multi-task Supervised Pre-training for Natural Language Generation
    Tang, Tianyi
    Li, Junyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8758 - 8794
  • [35] Grounded Language-Image Pre-training
    Li, Liunian Harold
    Zhang, Pengchuan
    Zhang, Haotian
    Yang, Jianwei
    Li, Chunyuan
    Zhong, Yiwu
    Wang, Lijuan
    Yuan, Lu
    Zhang, Lei
    Hwang, Jenq-Neng
    Chang, Kai-Wei
    Gao, Jianfeng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
  • [36] VILA: On Pre-training for Visual Language Models
    Lin, Ji
    Yin, Hongxu
    Ping, Wei
    Molchanov, Pavlo
    Shoeybi, Mohammad
    Han, Song
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
  • [37] RELATION ENHANCED VISION LANGUAGE PRE-TRAINING
    Lee, Ju-Hee
    Kang, Je-Won
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2286 - 2290
  • [38] Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
    Dong, Haoyu
    Cheng, Zhoujun
    He, Xinyi
    Zhou, Mengyu
    Zhou, Anda
    Zhou, Fan
    Liu, Ao
    Han, Shi
    Zhang, Dongmei
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 5426 - 5435
  • [39] MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
    Zhao, Hang
    Xing, Yifei
    Yu, Zhesong
    Zhu, Bilei
    Lu, Lu
    Ma, Zejun
    INTERSPEECH 2024, 2024, : 52 - 56
  • [40] Dialogue-adaptive language model pre-training from quality estimation
    Li, Junlong
    Zhang, Zhuosheng
    Zhao, Hai
    NEUROCOMPUTING, 2023, 516 : 27 - 35