SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers

被引:0
|
作者
Alahmadi, Mohammad D. [1 ]
Alshangiti, Moayad [1 ]
Alsubhi, Jumana [2 ]
机构
[1] Univ Jeddah, Coll Comp Sci & Engn, Dept Software Engn, Jeddah 23890, Saudi Arabia
[2] Univ Georgia, Sch Comp, Athens, GA 30602 USA
关键词
SCC (Source Code Classification); NLP (Natural Language Processing); Large Lagnuage Model (LLM);
D O I
10.3390/math12132128
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Developers often rely on online resources, such as Stack Overflow (SO), to seek assistance for programming tasks. To facilitate effective search and resource discovery, manual tagging of questions and posts with the appropriate programming language is essential. However, accurate tagging is not consistently achieved, leading to the need for the automated classification of code snippets into the correct programming language as a tag. In this study, we introduce a novel approach to automated classification of code snippets from Stack Overflow (SO) posts into programming languages using generative pre-trained transformers (GPT). Our method, which does not require additional training on labeled data or dependency on pre-existing labels, classifies 224,107 code snippets into 19 programming languages. We employ the text-davinci-003 model of ChatGPT-3.5 and postprocess its responses to accurately identify the programming language. Our empirical evaluation demonstrates that our GPT-based model (SCC-GPT) significantly outperforms existing methods, achieving a median F1-score improvement that ranges from +6% to +31%. These findings underscore the effectiveness of SCC-GPT in enhancing code snippet classification, offering a cost-effective and efficient solution for developers who rely on SO for programming assistance.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Generative pre-trained transformers (GPT) for surface engineering
    Kamnis, Spyros
    SURFACE & COATINGS TECHNOLOGY, 2023, 466
  • [2] Using Generative Pre-Trained Transformers (GPT) for Electricity Price Trend Forecasting in the Spanish Market
    Medina, Alberto Menendez
    Alvaro, Jose Antonio Heredia
    ENERGIES, 2024, 17 (10)
  • [3] Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?
    Savelka, Jaromir
    Agarwal, Arav
    Bogart, Christopher
    Song, Yifan
    Sakr, Majd
    PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 117 - 123
  • [4] GENERATIVE PRE-TRAINED TRANSFORMERS FOR BIOLOGICALLY INSPIRED DESIGN
    Zhu, Qihao
    Zhang, Xinyu
    Luo, Jianxi
    PROCEEDINGS OF ASME 2022 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2022, VOL 6, 2022,
  • [5] Generative pre-trained transformers (GPT)-based automated data mining for building energy management: Advantages, limitations and the future
    Zhang C.
    Lu J.
    Zhao Y.
    Energy and Built Environment, 2024, 5 (01): : 143 - 169
  • [6] Automated data mining framework for building energy conservation aided by generative pre-trained transformers (GPT)
    Zhang, Chaobo
    Zhang, Jian
    Zhao, Yang
    Lu, Jie
    ENERGY AND BUILDINGS, 2024, 305
  • [7] Generative Pre-Trained Transformers (GPT) and Space Health: A Potential Frontier in Astronaut Health During Exploration Missions
    Waisberg, Ethan
    Ong, Joshua
    Masalkhi, Mouayad
    Zaman, Nasif
    Kamran, Sharif Amit
    Sarker, Prithul
    Lee, Andrew G.
    Tavakkoli, Alireza
    PREHOSPITAL AND DISASTER MEDICINE, 2023, 38 (04) : 532 - 536
  • [8] Are Pre-trained Convolutions Better than Pre-trained Transformers?
    Tay, Yi
    Dehghani, Mostafa
    Gupta, Jai
    Aribandi, Vamsi
    Bahri, Dara
    Qin, Zhen
    Metzler, Donald
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
  • [9] Towards Summarizing Code Snippets Using Pre-Trained Transformers
    Mastropaolo, Antonio
    Ciniselli, Matteo
    Pascarella, Luca
    Tufano, Rosalia
    Aghajani, Emad
    Bavota, Gabriele
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 1 - 12
  • [10] Towards Summarizing Code Snippets Using Pre-Trained Transformers
    Mastropaolo, Antonio
    Tufano, Rosalia
    Ciniselli, Matteo
    Aghajani, Emad
    Pascarella, Luca
    Bavota, Gabriele
    arXiv, 1600,