Pre-training Language Model as a Multi-perspective Course Learner

被引:0
|
作者
Chen, Beiduo [1 ,2 ]
Huang, Shaohan [2 ]
Zhang, Zihan [2 ]
Guo, Wu [1 ]
Ling, Zhenhua [1 ]
Huang, Haizhen [2 ]
Wei, Furu [2 ]
Deng, Weiwei [2 ]
Zhang, Qi [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
[2] Microsoft Corp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ELECTRA (Clark et al., 2020), the generator-discriminator pre-training framework, has achieved impressive semantic construction capability among various downstream tasks. Despite the convincing performance, ELECTRA still faces the challenges of monotonous training and deficient interaction. Generator with only masked language modeling (MLM) leads to biased learning and label imbalance for discriminator, decreasing learning efficiency; no explicit feedback loop from discriminator to generator results in the chasm between these two components, underutilizing the course learning. In this study, a multi-perspective course learning (MCL) method is proposed to fetch a many degrees and visual angles for sample-efficient pre-training, and to fully leverage the relationship between generator and discriminator. Concretely, three self-supervision courses are designed to alleviate inherent flaws of MLM and balance the label in a multi-perspective way. Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision. Moreover, a course soups trial is conducted to solve the "tug-of-war" dynamics problem of MCL, evolving a stronger pre-trained model. Experimental results show that our method significantly improves ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on GLUE and SQuAD 2.0 benchmarks, and overshadows recent advanced ELECTRA-style models under the same settings. The pre-trained MCL model is available at https://huggingface.co/McmanusChen/MCL-base.
引用
收藏
页码:114 / 128
页数:15
相关论文
共 50 条
  • [1] MGeo: Multi-Modal Geographic Language Model Pre-Training
    Ding, Ruixue
    Chen, Boli
    Xie, Pengjun
    Huang, Fei
    Li, Xin
    Zhang, Qiang
    Xu, Yao
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 185 - 194
  • [2] Multi-task Pre-training Language Model for Semantic Network Completion
    Li, Da
    Zhu, Boqing
    Yang, Sen
    Xu, Kele
    Yi, Ming
    He, Yukai
    Wang, Huaimin
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (11)
  • [3] LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization
    Guo, Weidong
    Zhao, Mingjun
    Zhang, Lusheng
    Niu, Di
    Luo, Jinwen
    Liu, Zhenhua
    Li, Zhenyang
    Tang, Jianbo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1383 - 1392
  • [4] FlauBERT: Unsupervised Language Model Pre-training for French
    Le, Hang
    Vial, Loic
    Frej, Jibril
    Segonne, Vincent
    Coavoux, Maximin
    Lecouteux, Benjamin
    Allauzen, Alexandre
    Crabbe, Benoit
    Besacier, Laurent
    Schwab, Didier
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2479 - 2490
  • [5] Soft Language Clustering for Multilingual Model Pre-training
    Zeng, Jiali
    Jiang, Yufan
    Yin, Yongjing
    Jing, Yi
    Meng, Fandong
    Lin, Binghuai
    Cao, Yunbo
    Zhou, Jie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7021 - 7035
  • [6] An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training
    Arumae, Kristjan
    Sun, Qing
    Bhatia, Parminder
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4854 - 4864
  • [7] Unified Language Model Pre-training for Natural Language Understanding and Generation
    Dong, Li
    Yang, Nan
    Wang, Wenhui
    Wei, Furu
    Liu, Xiaodong
    Wang, Yu
    Gao, Jianfeng
    Zhou, Ming
    Hon, Hsiao-Wuen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] Simultaneously Training and Compressing Vision-and-Language Pre-Training Model
    Qi, Qiaosong
    Zhang, Aixi
    Liao, Yue
    Sun, Wenyu
    Wang, Yongliang
    Li, Xiaobo
    Liu, Si
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8194 - 8203
  • [9] Conditional Embedding Pre-Training Language Model for Image Captioning
    Li, Pengfei
    Zhang, Min
    Lin, Peijie
    Wan, Jian
    Jiang, Ming
    NEURAL PROCESSING LETTERS, 2022, 54 (06) : 4987 - 5003
  • [10] oLMpics-On What Language Model Pre-training Captures
    Talmor, Alon
    Elazar, Yanai
    Goldberg, Yoav
    Berant, Jonathan
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 (08) : 743 - 758