SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers

被引:0
|
作者
Alahmadi, Mohammad D. [1 ]
Alshangiti, Moayad [1 ]
Alsubhi, Jumana [2 ]
机构
[1] Univ Jeddah, Coll Comp Sci & Engn, Dept Software Engn, Jeddah 23890, Saudi Arabia
[2] Univ Georgia, Sch Comp, Athens, GA 30602 USA
关键词
SCC (Source Code Classification); NLP (Natural Language Processing); Large Lagnuage Model (LLM);
D O I
10.3390/math12132128
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Developers often rely on online resources, such as Stack Overflow (SO), to seek assistance for programming tasks. To facilitate effective search and resource discovery, manual tagging of questions and posts with the appropriate programming language is essential. However, accurate tagging is not consistently achieved, leading to the need for the automated classification of code snippets into the correct programming language as a tag. In this study, we introduce a novel approach to automated classification of code snippets from Stack Overflow (SO) posts into programming languages using generative pre-trained transformers (GPT). Our method, which does not require additional training on labeled data or dependency on pre-existing labels, classifies 224,107 code snippets into 19 programming languages. We employ the text-davinci-003 model of ChatGPT-3.5 and postprocess its responses to accurately identify the programming language. Our empirical evaluation demonstrates that our GPT-based model (SCC-GPT) significantly outperforms existing methods, achieving a median F1-score improvement that ranges from +6% to +31%. These findings underscore the effectiveness of SCC-GPT in enhancing code snippet classification, offering a cost-effective and efficient solution for developers who rely on SO for programming assistance.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer
    Xu, Xiaopeng
    Xu, Chencheng
    He, Wenjia
    Wei, Lesong
    Li, Haoyang
    Zhou, Juexiao
    Zhang, Ruochi
    Wang, Yu
    Xiong, Yuanpeng
    Gao, Xin
    BIOINFORMATICS, 2024, 40 (06)
  • [32] On solving textual ambiguities and semantic vagueness in MRC based question answering using generative pre-trained transformers
    Ahmed, Muzamil
    Khan, Hikmat
    Iqbal, Tassawar
    Alarfaj, Fawaz Khaled
    Alomair, Abdullah
    Almusallam, Naif
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [33] Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding
    Wang, Deze
    Jia, Zhouyang
    Li, Shanshan
    Yu, Yue
    Xiong, Yun
    Dong, Wei
    Liao, Xiangke
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 287 - 298
  • [34] Medical image Generative Pre-Trained Transformer (MI-GPT): future direction for precision medicine
    Xiaohui Zhang
    Yan Zhong
    Chentao Jin
    Daoyan Hu
    Mei Tian
    Hong Zhang
    European Journal of Nuclear Medicine and Molecular Imaging, 2024, 51 : 332 - 335
  • [35] Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery
    Korngiebel, Diane M.
    Mooney, Sean D.
    NPJ DIGITAL MEDICINE, 2021, 4 (01)
  • [36] Medical image Generative Pre-Trained Transformer (MI-GPT): future direction for precision medicine
    Zhang, Xiaohui
    Zhong, Yan
    Jin, Chentao
    Hu, Daoyan
    Tian, Mei
    Zhang, Hong
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 (02) : 332 - 335
  • [37] Students' Perspectives on the Application of a Generative Pre-Trained Transformer (GPT) in Chemistry Learning: A Case Study in Indonesia
    Ardyansyah, Ananta
    Yuwono, Agung Budhi
    Rahayu, Sri
    Alsulami, Naif Mastoor
    Sulistina, Oktavia
    JOURNAL OF CHEMICAL EDUCATION, 2024, 101 (09) : 3666 - 3675
  • [38] Brain Tumor Classification Using a Pre-Trained Auxiliary Generative Adversarial Network
    Kumaar, M. Akshay
    Samiayya, Duraimurugan
    Rajinikanth, Venkatesan
    Vincent, P. M. Durai Raj
    Kadry, Seifedine
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2024, 8 (06): : 101 - 111
  • [39] Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada
    Mousavi, Mehdi
    Shafiee, Shabnam
    Harley, Jason M.
    Cheung, Jackie Chi Kit
    Abbasgholizadeh Rahimi, Samira
    FAMILY MEDICINE AND COMMUNITY HEALTH, 2024, 12 (SUPPL_1)
  • [40] Ensemble Learning with Pre-Trained Transformers for Crash Severity Classification: A Deep NLP Approach
    Jaradat, Shadi
    Nayak, Richi
    Paz, Alexander
    Elhenawy, Mohammed
    ALGORITHMS, 2024, 17 (07)