Pre-training Language Model as a Multi-perspective Course Learner

被引:0
|
作者
Chen, Beiduo [1 ,2 ]
Huang, Shaohan [2 ]
Zhang, Zihan [2 ]
Guo, Wu [1 ]
Ling, Zhenhua [1 ]
Huang, Haizhen [2 ]
Wei, Furu [2 ]
Deng, Weiwei [2 ]
Zhang, Qi [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
[2] Microsoft Corp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ELECTRA (Clark et al., 2020), the generator-discriminator pre-training framework, has achieved impressive semantic construction capability among various downstream tasks. Despite the convincing performance, ELECTRA still faces the challenges of monotonous training and deficient interaction. Generator with only masked language modeling (MLM) leads to biased learning and label imbalance for discriminator, decreasing learning efficiency; no explicit feedback loop from discriminator to generator results in the chasm between these two components, underutilizing the course learning. In this study, a multi-perspective course learning (MCL) method is proposed to fetch a many degrees and visual angles for sample-efficient pre-training, and to fully leverage the relationship between generator and discriminator. Concretely, three self-supervision courses are designed to alleviate inherent flaws of MLM and balance the label in a multi-perspective way. Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision. Moreover, a course soups trial is conducted to solve the "tug-of-war" dynamics problem of MCL, evolving a stronger pre-trained model. Experimental results show that our method significantly improves ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on GLUE and SQuAD 2.0 benchmarks, and overshadows recent advanced ELECTRA-style models under the same settings. The pre-trained MCL model is available at https://huggingface.co/McmanusChen/MCL-base.
引用
收藏
页码:114 / 128
页数:15
相关论文
共 50 条
  • [41] ChouBERT: Pre-training French Language Model for Crowdsensing with Tweets in Phytosanitary Context
    Jiang, Shufan
    Angarita, Rafael
    Cormier, Stephane
    Orensanz, Julien
    Rousseaux, Francis
    RESEARCH CHALLENGES IN INFORMATION SCIENCE, 2022, 446 : 653 - 661
  • [42] Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task
    HUANG Jitao
    ZENG Guohui
    HUANG Bo
    GAO Yongbin
    LIU Jin
    SHI Zhicai
    Wuhan University Journal of Natural Sciences, 2021, 26 (02) : 147 - 155
  • [43] POSPAN: Position-Constrained Span Masking for Language Model Pre-training
    Zhang, Zhenyu
    Shen, Lei
    Zhao, Yuming
    Chen, Meng
    He, Xiaodong
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4420 - 4424
  • [44] Graph Structure Enhanced Pre-Training Language Model for Knowledge Graph Completion
    Zhu, Huashi
    Xu, Dexuan
    Huang, Yu
    Jin, Zhi
    Ding, Weiping
    Tong, Jiahui
    Chong, Guoshuang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2697 - 2708
  • [45] LuxemBERT: Simple and Practical Data Augmentation in Language Model Pre-Training for Luxembourgish
    Lothritz, Cedric
    Lebichot, Bertrand
    Allix, Kevin
    Veiber, Lisa
    Bissyande, Tegawende F.
    Klein, Jacques
    Boytsov, Andrey
    Goujon, Anne
    Lefebvre, Clement
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5080 - 5089
  • [46] MindLLM: Lightweight large language model pre-training, evaluation and domain application
    Yang, Yizhe
    Sun, Huashan
    Li, Jiawei
    Liu, Runheng
    Li, Yinghao
    Liu, Yuhang
    Gao, Yang
    Huang, Heyan
    AI OPEN, 2024, 5 : 155 - 180
  • [47] Speech Model Pre-training for End-to-End Spoken Language Understanding
    Lugosch, Loren
    Ravanelli, Mirco
    Ignoto, Patrick
    Tomar, Vikrant Singh
    Bengio, Yoshua
    INTERSPEECH 2019, 2019, : 814 - 818
  • [48] Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
    Gao, Luyu
    Callan, Jamie
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2843 - 2853
  • [49] Subsampling of Frequent Words in Text for Pre-training a Vision-Language Model
    Liang, Mingliang
    Larson, Martha
    PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 61 - 67
  • [50] BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling
    de la Rosa, Javier
    Ponferrada, Eduardo G.
    Villegas, Paulo
    de Prado Salas, Pablo Gonzalez
    Romero, Manu
    Grandury, Maria
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (68): : 13 - 23