A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning

被引:18
|
作者
Kotei, Evans [1 ]
Thirunavukarasu, Ramkumar [1 ]
机构
[1] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore 632014, India
关键词
transformer network; transfer learning; pretraining; natural language processing; language models; BERT;
D O I
10.3390/info14030187
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transfer learning is a technique utilized in deep learning applications to transmit learned inference to a different target domain. The approach is mainly to solve the problem of a few training datasets resulting in model overfitting, which affects model performance. The study was carried out on publications retrieved from various digital libraries such as SCOPUS, ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar, which formed the Primary studies. Secondary studies were retrieved from Primary articles using the backward and forward snowballing approach. Based on set inclusion and exclusion parameters, relevant publications were selected for review. The study focused on transfer learning pretrained NLP models based on the deep transformer network. BERT and GPT were the two elite pretrained models trained to classify global and local representations based on larger unlabeled text datasets through self-supervised learning. Pretrained transformer models offer numerous advantages to natural language processing models, such as knowledge transfer to downstream tasks that deal with drawbacks associated with training a model from scratch. This review gives a comprehensive view of transformer architecture, self-supervised learning and pretraining concepts in language models, and their adaptation to downstream tasks. Finally, we present future directions to further improvement in pretrained transformer-based language models.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Transformer-Based Self-Supervised Monocular Depth and Visual Odometry
    Zhao, Hongru
    Qiao, Xiuquan
    Ma, Yi
    Tafazolli, Rahim
    [J]. IEEE SENSORS JOURNAL, 2023, 23 (02) : 1436 - 1446
  • [42] CheSS: Chest X-Ray Pre-trained Model via Self-supervised Contrastive Learning
    Kyungjin Cho
    Ki Duk Kim
    Yujin Nam
    Jiheon Jeong
    Jeeyoung Kim
    Changyong Choi
    Soyoung Lee
    Jun Soo Lee
    Seoyeon Woo
    Gil-Sun Hong
    Joon Beom Seo
    Namkug Kim
    [J]. Journal of Digital Imaging, 2023, 36 : 902 - 910
  • [43] CheSS: Chest X-Ray Pre-trained Model via Self-supervised Contrastive Learning
    Cho, Kyungjin
    Kim, Ki Duk
    Nam, Yujin
    Jeong, Jiheon
    Kim, Jeeyoung
    Choi, Changyong
    Lee, Soyoung
    Lee, Jun Soo
    Woo, Seoyeon
    Hong, Gil-Sun
    Seo, Joon Beom
    Kim, Namkug
    [J]. JOURNAL OF DIGITAL IMAGING, 2023, 36 (03) : 902 - 910
  • [44] Adapting Pre-Trained Self-Supervised Learning Model for Speech Recognition with Light-Weight Adapters
    Yue, Xianghu
    Gao, Xiaoxue
    Qian, Xinyuan
    Li, Haizhou
    [J]. ELECTRONICS, 2024, 13 (01)
  • [45] Pre-trained Language Models in Biomedical Domain: A Systematic Survey
    Wang, Benyou
    Xie, Qianqian
    Pei, Jiahuan
    Chen, Zhihong
    Tiwari, Prayag
    Li, Zhao
    Fu, Jie
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [46] A Robust Approach to Fine-tune Pre-trained Transformer-based models for Text Summarization through Latent Space Compression
    Falaki, Ala Alam
    Gras, Robin
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 160 - 167
  • [47] Meta Distant Transfer Learning for Pre-trained Language Models
    Wang, Chengyu
    Pan, Haojie
    Qiu, Minghui
    Yang, Fei
    Huang, Jun
    Zhang, Yin
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9742 - 9752
  • [48] SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification
    Mishra, Animesh
    Jha, Ritesh
    Bhattacharjee, Vandana
    [J]. IEEE ACCESS, 2023, 11 : 6673 - 6681
  • [49] Explore the Use of Self-supervised Pre-trained Acoustic Features on Disguised Speech Detection
    Quan, Jie
    Yang, Yingchun
    [J]. BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 483 - 490
  • [50] Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation
    Bie, Rongfang
    Jiang, Jinxiu
    Xie, Hongcheng
    Guo, Yu
    Miao, Yinbin
    Jia, Xiaohua
    [J]. IEEE Transactions on Services Computing, 2024, 17 (05): : 2613 - 2625