CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation

被引:2
|
作者
Shao, Yunfan [1 ,3 ]
Geng, Zhichao [1 ,3 ]
Liu, Yitao [1 ,3 ]
Dai, Junqi [1 ,3 ]
Yan, Hang [1 ,3 ]
Yang, Fei [2 ]
Li, Zhe [2 ]
Bao, Hujun [2 ]
Qiu, Xipeng [1 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Zhejiang Lab, Hangzhou 311121, Peoples R China
[3] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
pre-trained model; transformer; language model; generation; unified model;
D O I
10.1007/s11432-021-3536-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] CPT: a pre-trained unbalanced transformer for both Chinese language understanding and generation
    Yunfan SHAO
    Zhichao GENG
    Yitao LIU
    Junqi DAI
    Hang YAN
    Fei YANG
    Zhe LI
    Hujun BAO
    Xipeng QIU
    [J]. Science China(Information Sciences), 2024, 67 (05) : 43 - 55
  • [2] AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation
    Tian, Huishuang
    Yang, Kexin
    Liu, Dayiheng
    Lv, Jiancheng
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [3] ShellGPT: Generative Pre-trained Transformer Model for Shell Language Understanding
    Shi, Jie
    Jiang, Sihang
    Xu, Bo
    Liang, Jiaqing
    Xiao, Yanghua
    Wang, Wei
    [J]. 2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 671 - 682
  • [4] Automatic Title Generation for Text with Pre-trained Transformer Language Model
    Mishra, Prakhar
    Diwan, Chaitali
    Srinivasa, Srinath
    Srinivasaraghavan, G.
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2021), 2021, : 17 - 24
  • [5] JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding
    Zhao, Wayne Xin
    Zhou, Kun
    Gong, Zheng
    Zhang, Beichen
    Zhou, Yuanhang
    Sha, Jing
    Chen, Zhigang
    Wang, Shijin
    Liu, Cong
    Wen, Ji-Rong
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4571 - 4581
  • [6] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [7] Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models
    Liang, Xinnian
    Zhou, Zefan
    Huang, Hui
    Wu, Shuangzhi
    Xiao, Tong
    Yang, Muyun
    Li, Zhoujun
    Bian, Chao
    [J]. arXiv, 2023,
  • [8] Understanding Online Attitudes with Pre-Trained Language Models
    Power, William
    Obradovic, Zoran
    [J]. PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 745 - 752
  • [9] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    [J]. Journal of Big Data, 9
  • [10] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)