Assessing the Code Clone Detection Capability of Large Language Models

被引:0
|
作者
Zhang, Zixian [1 ]
Saber, Takfarinas [1 ]
机构
[1] Univ Galway, Sch Comp Sci, Galway, Ireland
基金
爱尔兰科学基金会;
关键词
Code Clone Detection; Large Language Models (LLMs); GPT-3.5; GPT-4; Semantic Analysis;
D O I
10.1109/ICCQ60895.2024.10576803
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types and levels of similarity, sourced from two datasets: BigCloneBench (human-made) and GPTCloneBench (LLM-generated). Findings from the study indicate that GPT-4 consistently surpasses GPT-3.5 across all clone types. A correlation was observed between the GPTs' accuracy at identifying code clones and code similarity, with both GPT models exhibiting low effectiveness in detecting the most complex Type-4 code clones. Additionally, GPT models demonstrate a higher performance identifying code clones in LLM-generated code compared to humans-generated code. However, they do not reach impressive accuracy. These results emphasize the imperative for ongoing enhancements in LLM capabilities, particularly in the recognition of code clones and in mitigating their predisposition towards self-generated code clones-which is likely to become an issue as software engineers are more numerous to leverage LLM-enabled code generation and code refactoring tools.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Investigating the Efficacy of Large Language Models for Code Clone Detection
    Khajezade, Mohamad
    Wu, Jie J. W.
    Fard, Fatemeh Hendijani
    Rodriguez-Perez, Gema
    Shehata, Mohamed Sami
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 161 - 165
  • [2] Assessing the Capability of Large Language Models in Naturopathy Consultation
    Mondal, Himel
    Komarraju, Satyalakshmi
    Sathyanath, D.
    Muralidharan, Shrikanth
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (05)
  • [3] Code Detection for Hardware Acceleration Using Large Language Models
    Martinez, Pablo Antonio
    Bernabe, Gregorio
    Garcia, Jose Manuel
    IEEE ACCESS, 2024, 12 : 35271 - 35281
  • [4] An Efficient New Multi-Language Clone Detection Approach from Large Source Code
    Rehman, Saif Ur
    Khan, Kamran
    Fong, Simon
    Biuk-Aghai, Robert
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 937 - 940
  • [5] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    ProQuest Dissertations and Theses Global, 2022,
  • [6] Models are Code too: Near-miss Clone Detection for Simulink Models
    Alalfi, Manar H.
    Cordy, James R.
    Dean, Thomas R.
    Stephan, Matthew
    Stevenson, Andrew
    2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2012, : 295 - 304
  • [7] The Comparative Diagnostic Capability of Large Language Models in Otolaryngology
    Warrier, Akshay
    Singh, Rohan
    Haleem, Afash
    Zaki, Haider
    Eloy, Jean Anderson
    LARYNGOSCOPE, 2024, 134 (09): : 3997 - 4002
  • [8] DebugBench: Evaluating Debugging Capability of Large Language Models
    Tian, Runchu
    Ye, Yining
    Qin, Yujia
    Cong, Xin
    Lin, Yankai
    Pan, Yinxu
    Wu, Yesai
    Hui, Haotian
    Liu, Weichuan
    Liu, Zhiyuan
    Sun, Maosong
    Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, : 4173 - 4198
  • [9] MAGECODE: Machine-Generated Code Detection Method Using Large Language Models
    Pham, Hung
    Ha, Huyen
    Tong, Van
    Hoang, Dung
    Tran, Duc
    Le, Tuyen Ngoc
    IEEE Access, 2024, 12 : 190186 - 190202
  • [10] Large Language Models of Code Fail at Completing Code with Potential Bugs
    Tuan Dinh
    Zhao, Jinman
    Tan, Samson
    Negrinho, Renato
    Lausen, Leonard
    Zha, Sheng
    Karypis, George
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,