A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning

被引：18

作者：

Kotei, Evans ^{[1
]}

Thirunavukarasu, Ramkumar ^{[1
]}

机构：

[1] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore 632014, India

来源：

INFORMATION | 2023年 / 14卷 / 03期

关键词：

transformer network; transfer learning; pretraining; natural language processing; language models; BERT;

D O I：

10.3390/info14030187

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Transfer learning is a technique utilized in deep learning applications to transmit learned inference to a different target domain. The approach is mainly to solve the problem of a few training datasets resulting in model overfitting, which affects model performance. The study was carried out on publications retrieved from various digital libraries such as SCOPUS, ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar, which formed the Primary studies. Secondary studies were retrieved from Primary articles using the backward and forward snowballing approach. Based on set inclusion and exclusion parameters, relevant publications were selected for review. The study focused on transfer learning pretrained NLP models based on the deep transformer network. BERT and GPT were the two elite pretrained models trained to classify global and local representations based on larger unlabeled text datasets through self-supervised learning. Pretrained transformer models offer numerous advantages to natural language processing models, such as knowledge transfer to downstream tasks that deal with drawbacks associated with training a model from scratch. This review gives a comprehensive view of transformer architecture, self-supervised learning and pretraining concepts in language models, and their adaptation to downstream tasks. Finally, we present future directions to further improvement in pretrained transformer-based language models.

引用

页数：25

共 50 条

[1] Pre-trained transformer-based language models for Sundanese
Wilson Wongso
Henry Lucky
Derwin Suhartono
[J]. Journal of Big Data, 9
[2] Pre-trained transformer-based language models for Sundanese
Wongso, Wilson
Lucky, Henry
Suhartono, Derwin
[J]. JOURNAL OF BIG DATA, 2022, 9 (01)
[3] Enhancing Pre-trained Language Models by Self-supervised Learning for Story Cloze Test
Xie, Yuqiang
Hu, Yue
Xing, Luxi
Wang, Chunhui
Hu, Yong
Wei, Xiangpeng
Sun, Yajing
[J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT I, 2020, 12274 : 271 - 279
[4] A Mathematical Interpretation of Autoregressive Generative Pre-Trained Transformer and Self-Supervised Learning
Lee, Minhyeok
[J]. MATHEMATICS, 2023, 11 (11)
[5] Prediction of Protein Tertiary Structure Using Pre-Trained Self-Supervised Learning Based on Transformer
Kurniawan, Alif
Jatmiko, Wisnu
Hertadi, Rukman
Habibie, Novian
[J]. 2020 5TH INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2020), 2020, : 75 - 80
[6] Multi-task Active Learning for Pre-trained Transformer-based Models
Rotman, Guy
Reichart, Roi
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1209 - 1228
[7] A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models
Zhang, Hanqing
Song, Haolin
Li, Shaoyu
Zhou, Ming
Song, Dawei
[J]. ACM COMPUTING SURVEYS, 2024, 56 (03)
[8] Unsupervised Visual Anomaly Detection Using Self-Supervised Pre-Trained Transformer
Kim, Jun-Hyung
Kwon, Goo-Rak
[J]. IEEE ACCESS, 2024, 12 : 127604 - 127613
[9] Transformer-Based Self-Supervised Learning for Emotion Recognition
Vazquez-Rodriguez, Juan
Lefebvre, Gregoire
Cumin, Julien
Crowley, James L.
[J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
[10] BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
Jia, Jinyuan
Liu, Yupei
Gong, Neil Zhenqiang
[J]. 43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 2043 - 2059

← 1 2 3 4 5 →