共 137 条
- [1] Vaswani A, Shazeer N, Parmar N, Et al., Attention is all you need[C], Advances in Neural Information Processing Systems 30: Annual Conf on Neural Information Processing Systems 2017, pp. 5998-6008, (2017)
- [2] Bender E M, Gebru T, McMillan-Major A, Et al., On the dangers of stochastic parrots: Can language models be too big?[C], Proc of the 2021 ACM Conf on Fairness, Accountability, and Transparency, pp. 610-623, (2021)
- [3] Open AI., GPT-4 technical report, (2023)
- [4] Radford A, Wu J, Child R, Et al., Language models are unsupervised multitask learners[J], OpenAI Blog, 1, 8, pp. 1-24, (2019)
- [5] Anil R, Dai A M, Firat O, Et al., PaLM 2 technical report, (2023)
- [6] Touvron H, Martin L, Stone K, Et al., LLaMA 2: Open foundation and fine-tuned chat models, (2023)
- [7] Sun Yu, Wang Shuohuan, Feng Shikun, Et al., ERNIE 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation, (2021)
- [8] Du Zhengxiao, Qian Yujie, Liu Xiao, Et al., GLM: General language model pretraining with autoregressive blank infilling, Proc of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 320-335, (2022)
- [9] Ren Xiaozhe, Zhou Pingyi, Meng Xinfan, Et al., PanGu-Σ: Towards trillion parameter language model with sparse heterogeneous computing, (2023)
- [10] Bai Jinze, Bai Shuai, Yang Shusheng, Et al., Qwen-VL: A versatile vision-language model for understanding, localization, text reading, and beyond, (2023)