Assessing the Code Clone Detection Capability of Large Language Models

被引：0

作者：

Zhang, Zixian ^{[1
]}

Saber, Takfarinas ^{[1
]}

机构：

[1] Univ Galway, Sch Comp Sci, Galway, Ireland

来源：

PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON CODE QUALITY, ICCQ 2024 | 2024年

基金：

爱尔兰科学基金会;

关键词：

Code Clone Detection; Large Language Models (LLMs); GPT-3.5; GPT-4; Semantic Analysis;

D O I：

10.1109/ICCQ60895.2024.10576803

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types and levels of similarity, sourced from two datasets: BigCloneBench (human-made) and GPTCloneBench (LLM-generated). Findings from the study indicate that GPT-4 consistently surpasses GPT-3.5 across all clone types. A correlation was observed between the GPTs' accuracy at identifying code clones and code similarity, with both GPT models exhibiting low effectiveness in detecting the most complex Type-4 code clones. Additionally, GPT models demonstrate a higher performance identifying code clones in LLM-generated code compared to humans-generated code. However, they do not reach impressive accuracy. These results emphasize the imperative for ongoing enhancements in LLM capabilities, particularly in the recognition of code clones and in mitigating their predisposition towards self-generated code clones-which is likely to become an issue as software engineers are more numerous to leverage LLM-enabled code generation and code refactoring tools.

引用

页数：9

共 50 条

[1] Investigating the Efficacy of Large Language Models for Code Clone Detection
Khajezade, Mohamad
Wu, Jie J. W.
Fard, Fatemeh Hendijani
Rodriguez-Perez, Gema
Shehata, Mohamed Sami
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 161 - 165
[2] Assessing the Capability of Large Language Models in Naturopathy Consultation
Mondal, Himel
Komarraju, Satyalakshmi
Sathyanath, D.
Muralidharan, Shrikanth
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (05)
[3] Code Detection for Hardware Acceleration Using Large Language Models
Martinez, Pablo Antonio
Bernabe, Gregorio
Garcia, Jose Manuel
IEEE ACCESS, 2024, 12 : 35271 - 35281
[4] An Efficient New Multi-Language Clone Detection Approach from Large Source Code
Rehman, Saif Ur
Khan, Kamran
Fong, Simon
Biuk-Aghai, Robert
PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 937 - 940
[5] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
Mathew, George Varghese
ProQuest Dissertations and Theses Global, 2022,
[6] Models are Code too: Near-miss Clone Detection for Simulink Models
Alalfi, Manar H.
Cordy, James R.
Dean, Thomas R.
Stephan, Matthew
Stevenson, Andrew
2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2012, : 295 - 304
[7] The Comparative Diagnostic Capability of Large Language Models in Otolaryngology
Warrier, Akshay
Singh, Rohan
Haleem, Afash
Zaki, Haider
Eloy, Jean Anderson
LARYNGOSCOPE, 2024, 134 (09): : 3997 - 4002
[8] DebugBench: Evaluating Debugging Capability of Large Language Models
Tian, Runchu
Ye, Yining
Qin, Yujia
Cong, Xin
Lin, Yankai
Pan, Yinxu
Wu, Yesai
Hui, Haotian
Liu, Weichuan
Liu, Zhiyuan
Sun, Maosong
Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, : 4173 - 4198
[9] MAGECODE: Machine-Generated Code Detection Method Using Large Language Models
Pham, Hung
Ha, Huyen
Tong, Van
Hoang, Dung
Tran, Duc
Le, Tuyen Ngoc
IEEE Access, 2024, 12 : 190186 - 190202
[10] Large Language Models of Code Fail at Completing Code with Potential Bugs
Tuan Dinh
Zhao, Jinman
Tan, Samson
Negrinho, Renato
Lausen, Leonard
Zha, Sheng
Karypis, George
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →