Pre-training Language Model as a Multi-perspective Course Learner

被引:0
|
作者
Chen, Beiduo [1 ,2 ]
Huang, Shaohan [2 ]
Zhang, Zihan [2 ]
Guo, Wu [1 ]
Ling, Zhenhua [1 ]
Huang, Haizhen [2 ]
Wei, Furu [2 ]
Deng, Weiwei [2 ]
Zhang, Qi [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
[2] Microsoft Corp, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ELECTRA (Clark et al., 2020), the generator-discriminator pre-training framework, has achieved impressive semantic construction capability among various downstream tasks. Despite the convincing performance, ELECTRA still faces the challenges of monotonous training and deficient interaction. Generator with only masked language modeling (MLM) leads to biased learning and label imbalance for discriminator, decreasing learning efficiency; no explicit feedback loop from discriminator to generator results in the chasm between these two components, underutilizing the course learning. In this study, a multi-perspective course learning (MCL) method is proposed to fetch a many degrees and visual angles for sample-efficient pre-training, and to fully leverage the relationship between generator and discriminator. Concretely, three self-supervision courses are designed to alleviate inherent flaws of MLM and balance the label in a multi-perspective way. Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision. Moreover, a course soups trial is conducted to solve the "tug-of-war" dynamics problem of MCL, evolving a stronger pre-trained model. Experimental results show that our method significantly improves ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on GLUE and SQuAD 2.0 benchmarks, and overshadows recent advanced ELECTRA-style models under the same settings. The pre-trained MCL model is available at https://huggingface.co/McmanusChen/MCL-base.
引用
收藏
页码:114 / 128
页数:15
相关论文
共 50 条
  • [21] Multi-label Patent Classification with Pre-training Model
    Xinyu T.
    Ruijie Z.
    Yonghe L.
    Data Analysis and Knowledge Discovery, 2022, 6 (2-3) : 129 - 137
  • [22] Knowledge distilled pre-training model for vision-language-navigation
    Bo Huang
    Shuai Zhang
    Jitao Huang
    Yijun Yu
    Zhicai Shi
    Yujie Xiong
    Applied Intelligence, 2023, 53 : 5607 - 5619
  • [23] Lightweight Model Pre-Training via Language Guided Knowledge Distillation
    Li, Mingsheng
    Zhang, Lin
    Zhu, Mingzhen
    Huang, Zilong
    Yu, Gang
    Fan, Jiayuan
    Chen, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10720 - 10730
  • [24] CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis
    Zhao, Tianqi
    Kong, Ming
    Liang, Tian
    Zhu, Qiang
    Kuang, Kun
    Wu, Fei
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 622 - 626
  • [25] Continual Pre-Training of Python Language Model to mT5
    Kajiura, Teruno
    Souma, Nao
    Sato, Miyu
    Kuramitsu, Kimio
    Computer Software, 2023, 40 (04): : 10 - 21
  • [26] Knowledge distilled pre-training model for vision-language-navigation
    Huang, Bo
    Zhang, Shuai
    Huang, Jitao
    Yu, Yijun
    Shi, Zhicai
    Xiong, Yujie
    APPLIED INTELLIGENCE, 2023, 53 (05) : 5607 - 5619
  • [27] Dict-BERT: Enhancing Language Model Pre-training with Dictionary
    Yu, Wenhao
    Zhu, Chenguang
    Fang, Yuwei
    Yu, Donghan
    Wang, Shuohang
    Xu, Yichong
    Zeng, Michael
    Jiang, Meng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1907 - 1918
  • [28] SAS: Self-Augmentation Strategy for Language Model Pre-training
    Xu, Yifei
    Zhang, Jingqiao
    He, Ru
    Ge, Liangzhu
    Yang, Chao
    Yang, Cheng
    Wu, Ying Nian
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11586 - 11594
  • [29] MoDNA: Motif-Oriented Pre-training For DNA Language Model
    An, Weizhi
    Guo, Yuzhi
    Bian, Yatao
    Ma, Hehuan
    Yang, Jinyu
    Li, Chunyuan
    Huang, Junzhou
    13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [30] Multi-Grained Topological Pre-Training of Language Models in Sponsored Search
    Tian, Zhoujin
    Li, Chaozhuo
    Zuo, Zhiqiang
    Wen, Zengxuan
    Hu, Xinyue
    Han, Xiao
    Huang, Haizhen
    Wang, Senzhang
    Deng, Weiwei
    Xie, Xing
    Zhang, Qi
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2189 - 2193