SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers

被引：0

作者：

Alahmadi, Mohammad D. ^{[1
]}

Alshangiti, Moayad ^{[1
]}

Alsubhi, Jumana ^{[2
]}

机构：

[1] Univ Jeddah, Coll Comp Sci & Engn, Dept Software Engn, Jeddah 23890, Saudi Arabia

[2] Univ Georgia, Sch Comp, Athens, GA 30602 USA

来源：

MATHEMATICS | 2024年 / 12卷 / 13期

关键词：

SCC (Source Code Classification); NLP (Natural Language Processing); Large Lagnuage Model (LLM);

D O I：

10.3390/math12132128

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Developers often rely on online resources, such as Stack Overflow (SO), to seek assistance for programming tasks. To facilitate effective search and resource discovery, manual tagging of questions and posts with the appropriate programming language is essential. However, accurate tagging is not consistently achieved, leading to the need for the automated classification of code snippets into the correct programming language as a tag. In this study, we introduce a novel approach to automated classification of code snippets from Stack Overflow (SO) posts into programming languages using generative pre-trained transformers (GPT). Our method, which does not require additional training on labeled data or dependency on pre-existing labels, classifies 224,107 code snippets into 19 programming languages. We employ the text-davinci-003 model of ChatGPT-3.5 and postprocess its responses to accurately identify the programming language. Our empirical evaluation demonstrates that our GPT-based model (SCC-GPT) significantly outperforms existing methods, achieving a median F1-score improvement that ranges from +6% to +31%. These findings underscore the effectiveness of SCC-GPT in enhancing code snippet classification, offering a cost-effective and efficient solution for developers who rely on SO for programming assistance.

引用

页数：12

共 50 条

[1] Generative pre-trained transformers (GPT) for surface engineering
Kamnis, Spyros
SURFACE & COATINGS TECHNOLOGY, 2023, 466
[2] Using Generative Pre-Trained Transformers (GPT) for Electricity Price Trend Forecasting in the Spanish Market
Medina, Alberto Menendez
Alvaro, Jose Antonio Heredia
ENERGIES, 2024, 17 (10)
[3] Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?
Savelka, Jaromir
Agarwal, Arav
Bogart, Christopher
Song, Yifan
Sakr, Majd
PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 117 - 123
[4] GENERATIVE PRE-TRAINED TRANSFORMERS FOR BIOLOGICALLY INSPIRED DESIGN
Zhu, Qihao
Zhang, Xinyu
Luo, Jianxi
PROCEEDINGS OF ASME 2022 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2022, VOL 6, 2022,
[5] Generative pre-trained transformers (GPT)-based automated data mining for building energy management: Advantages, limitations and the future
Zhang C.
Lu J.
Zhao Y.
Energy and Built Environment, 2024, 5 (01): : 143 - 169
[6] Automated data mining framework for building energy conservation aided by generative pre-trained transformers (GPT)
Zhang, Chaobo
Zhang, Jian
Zhao, Yang
Lu, Jie
ENERGY AND BUILDINGS, 2024, 305
[7] Generative Pre-Trained Transformers (GPT) and Space Health: A Potential Frontier in Astronaut Health During Exploration Missions
Waisberg, Ethan
Ong, Joshua
Masalkhi, Mouayad
Zaman, Nasif
Kamran, Sharif Amit
Sarker, Prithul
Lee, Andrew G.
Tavakkoli, Alireza
PREHOSPITAL AND DISASTER MEDICINE, 2023, 38 (04) : 532 - 536
[8] Are Pre-trained Convolutions Better than Pre-trained Transformers?
Tay, Yi
Dehghani, Mostafa
Gupta, Jai
Aribandi, Vamsi
Bahri, Dara
Qin, Zhen
Metzler, Donald
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
[9] Towards Summarizing Code Snippets Using Pre-Trained Transformers
Mastropaolo, Antonio
Ciniselli, Matteo
Pascarella, Luca
Tufano, Rosalia
Aghajani, Emad
Bavota, Gabriele
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 1 - 12
[10] Towards Summarizing Code Snippets Using Pre-Trained Transformers
Mastropaolo, Antonio
Tufano, Rosalia
Ciniselli, Matteo
Aghajani, Emad
Pascarella, Luca
Bavota, Gabriele
arXiv, 1600,

← 1 2 3 4 5 →