SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers

被引:0
|
作者
Alahmadi, Mohammad D. [1 ]
Alshangiti, Moayad [1 ]
Alsubhi, Jumana [2 ]
机构
[1] Univ Jeddah, Coll Comp Sci & Engn, Dept Software Engn, Jeddah 23890, Saudi Arabia
[2] Univ Georgia, Sch Comp, Athens, GA 30602 USA
关键词
SCC (Source Code Classification); NLP (Natural Language Processing); Large Lagnuage Model (LLM);
D O I
10.3390/math12132128
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Developers often rely on online resources, such as Stack Overflow (SO), to seek assistance for programming tasks. To facilitate effective search and resource discovery, manual tagging of questions and posts with the appropriate programming language is essential. However, accurate tagging is not consistently achieved, leading to the need for the automated classification of code snippets into the correct programming language as a tag. In this study, we introduce a novel approach to automated classification of code snippets from Stack Overflow (SO) posts into programming languages using generative pre-trained transformers (GPT). Our method, which does not require additional training on labeled data or dependency on pre-existing labels, classifies 224,107 code snippets into 19 programming languages. We employ the text-davinci-003 model of ChatGPT-3.5 and postprocess its responses to accurately identify the programming language. Our empirical evaluation demonstrates that our GPT-based model (SCC-GPT) significantly outperforms existing methods, achieving a median F1-score improvement that ranges from +6% to +31%. These findings underscore the effectiveness of SCC-GPT in enhancing code snippet classification, offering a cost-effective and efficient solution for developers who rely on SO for programming assistance.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Pre-trained transformers: an empirical comparison
    Casola, Silvia
    Lauriola, Ivano
    Lavelli, Alberto
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [22] CODEEDITOR: Learning to Edit Source Code with Pre-trained Models
    Li, Jia
    Li, Ge
    Li, Zhuo
    Jin, Zhi
    Hu, Xing
    Zhang, Kechi
    Fu, Zhiyi
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (06)
  • [23] Emotion Classification using Generative Pre-trained Embedding and Machine Learning
    Pattun, Geeta
    Kumar, Pradeep
    2023 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES, ICMLANT, 2023, : 121 - 126
  • [24] Towards Java']JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
    Lajko, Mark
    Csuvik, Viktor
    Vidacs, Laszlo
    INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 61 - 68
  • [25] GPT-LS: Generative Pre-Trained Transformer with Offline Reinforcement Learning for Logic Synthesis
    Lv, Chenyang
    Wei, Ziling
    Qian, Weikang
    Ye, Junjie
    Feng, Chang
    He, Zhezhi
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 320 - 326
  • [26] Zero-shot Mathematical Problem Solving via Generative Pre-trained Transformers
    Galatolo, Federico A.
    Cimino, Mario G. C. A.
    Vaglini, Gigliola
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 479 - 483
  • [27] Event Stream GPT: A Data Pre-processing and Modeling Library for Generative, Pre-trained Transformers over Continuous-time Sequences of Complex Events
    McDermott, Matthew B. A.
    Nestor, Bret
    Argaw, Peniel
    Jin, Ye
    Kohane, Isaac
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] EAPT: An encrypted traffic classification model via adversarial pre-trained transformers
    Zhan, Mingming
    Yang, Jin
    Jia, Dongqing
    Fu, Geyuan
    Computer Networks, 2025, 257
  • [29] On solving textual ambiguities and semantic vagueness in MRC based question answering using generative pre-trained transformers
    Ahmed M.
    Khan H.
    Iqbal T.
    Alarfaj F.K.
    Alomair A.
    Almusallam N.
    PeerJ Computer Science, 2023, 9
  • [30] Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery
    Diane M. Korngiebel
    Sean D. Mooney
    npj Digital Medicine, 4