共 4 条
- [1] CPM: A large-scale generative Chinese Pre-trained language model[J] . Zhang Zhengyan,Han Xu,Zhou Hao,Ke Pei,Gu Yuxian,Ye Deming,Qin Yujia,Su Yusheng,Ji Haozhe,Guan Jian,Qi Fanchao,Wang Xiaozhi,Zheng Yanan,Zeng Guoyang,Cao Huanqi,Chen Shengqi,Li Daixuan,Sun Zhenbo,Liu Zhiyuan,Huang Minlie,Han Wentao,Tang Jie,Li Juanzi,Zhu Xiaoyan,Sun Maosong. AI Open . 2021
- [2] RoBERTa: A Robustly Optimized BERT Pretraining Approach.[J] . Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Mike Lewis,Luke Zettlemoyer,Veselin Stoyanov. CoRR . 2019
- [3] DAPPLE:a pipelined data parallel approach for training large models .2 FAN S Q,RONG Y,MENG C,et al. https://cs.paperswithcode.com/paper/dapple-a-pipelined-data-parallel-approach-for . 2022
- [4] Exploring the limits of transfer learning with a unified text-to-text transformer .2 RAFFEL C,SHAZEER N,ROBERTS A,et al. https://arxiv.org/abs/1910.10683 . 2022