A Mathematical Interpretation of Autoregressive Generative Pre-Trained Transformer and Self-Supervised Learning

被引：10

作者：

Lee, Minhyeok ^{[1
]}

机构：

[1] Chung Ang Univ, Sch Elect & Elect Engn, Seoul 06974, South Korea

来源：

MATHEMATICS | 2023年 / 11卷 / 11期

关键词：

generative pre-trained transformer; GPT; ChatGPT; self-supervised learning; deep learning; natural language processing; NLP;

D O I：

10.3390/math11112451

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

In this paper, we present a rigorous mathematical examination of generative pre-trained transformer (GPT) models and their autoregressive self-supervised learning mechanisms. We begin by defining natural language space and knowledge space, which are two key concepts for understanding the dimensionality reduction process in GPT-based large language models (LLMs). By exploring projection functions and their inverses, we establish a framework for analyzing the language generation capabilities of these models. We then investigate the GPT representation space, examining its implications for the models' approximation properties. Finally, we discuss the limitations and challenges of GPT models and their learning mechanisms, considering trade-offs between complexity and generalization, as well as the implications of incomplete inverse projection functions. Our findings demonstrate that GPT models possess the capability to encode knowledge into low-dimensional vectors through their autoregressive self-supervised learning mechanism. This comprehensive analysis provides a solid mathematical foundation for future advancements in GPT-based LLMs, promising advancements in natural language processing tasks such as language translation, text summarization, and question answering due to improved understanding and optimization of model training and performance.

引用

页数：19

共 50 条

[31] Generative Pre-Trained Transformer-Based Reinforcement Learning for Testing Web Application Firewalls
Liang, Hongliang
Li, Xiangyu
Xiao, Da
Liu, Jie
Zhou, Yanjie
Wang, Aibo
Li, Jin
[J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (01) : 309 - 324
[32] The impact of Chat Generative Pre-trained Transformer (ChatGPT) on medical education
Heng, Jonathan J. Y.
Teo, Desmond B.
Tan, L. F.
[J]. POSTGRADUATE MEDICAL JOURNAL, 2023, 99 (1176) : 1125 - 1127
[33] Enhancing rumor detection with data augmentation and generative pre-trained transformer
Askarizade, Mojgan
[J]. Expert Systems with Applications, 2025, 262
[34] BioGPT: generative pre-trained transformer for biomedical text generation and mining
Luo, Renqian
Sun, Liai
Xia, Yingce
Qin, Tao
Zhang, Sheng
Poon, Hoifung
Liu, Tie-Yan
[J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (06)
[35] Industrial-generative pre-trained transformer for intelligent manufacturing systems
Wang, Han
Liu, Min
Shen, Weiming
[J]. IET COLLABORATIVE INTELLIGENT MANUFACTURING, 2023, 5 (02)
[36] Generative Pre-trained Transformer for Pediatric Stroke Research: A Pilot Study
Fiedler, Anna K.
Zhang, Kai
Lal, Tia S.
Jiang, Xiaoqian
Fraser, Stuart M.
[J]. PEDIATRIC NEUROLOGY, 2024, 160
[37] ShellGPT: Generative Pre-trained Transformer Model for Shell Language Understanding
Shi, Jie
Jiang, Sihang
Xu, Bo
Liang, Jiaqing
Xiao, Yanghua
Wang, Wei
[J]. 2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 671 - 682
[38] SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification
Mishra, Animesh
Jha, Ritesh
Bhattacharjee, Vandana
[J]. IEEE ACCESS, 2023, 11 : 6673 - 6681
[39] Interpretabilty of Speech Emotion Recognition modelled using Self-Supervised Speech and Text Pre-Trained Embeddings
Girish, K. V. Vijay
Konjeti, Srikanth
Vepa, Jithendra
[J]. INTERSPEECH 2022, 2022, : 4496 - 4500
[40] AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning
Xu, Xiaopeng
Xu, Tiantian
Zhou, Juexiao
Liao, Xingyu
Zhang, Ruochi
Wang, Yu
Zhang, Lu
Gao, Xin
[J]. GENOMICS PROTEOMICS & BIOINFORMATICS, 2023, 21 (05) : 1043 - 1053

← 1 2 3 4 5 →