A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges

被引:10
|
作者
Zakeri-Nasrabadi, Morteza [1 ]
Parsa, Saeed [1 ]
Ramezani, Mohammad [1 ]
Roy, Chanchal [2 ]
Ekhtiarzadeh, Masoud [1 ]
机构
[1] Iran Univ Sci & Technol, Sch Comp Engn, Hengam St, Resalat Sq, Tehran 1684613114, Iran
[2] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
关键词
Source code similarity; Code clone; Plagiarism detection; Code recommendation; Systematic literature review; SOFTWARE; BENCHMARK; FRAMEWORK; EFFICIENT; PROGRAMS; NICAD; COPY;
D O I
10.1016/j.jss.2023.111796
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10,000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.& COPY; 2023 Elsevier Inc. All rights reserved.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Code Clone Detection: A Literature Review
    Chen Q.-Y.
    Li S.-P.
    Yan M.
    Xia X.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 962 - 980
  • [2] A Systematic Review on Code Clone Detection
    Ul Ain, Qurat
    Butt, Wasi Haider
    Anwar, Muhammad Waseem
    Azam, Farooque
    Maqbool, Bilal
    [J]. IEEE ACCESS, 2019, 7 : 86121 - 86144
  • [3] Source Code Clone Detection Using Unsupervised Similarity Measures
    Martinez-Gil, Jorge
    [J]. SOFTWARE QUALITY AS A FOUNDATION FOR SECURITY, SWQD 2024, 2024, 505 : 21 - 37
  • [4] A systematic literature review on the applications of recurrent neural networks in code clone research
    Quradaa, Fahmi H.
    Shahzad, Sara
    Almoqbily, Rashad S.
    [J]. PLOS ONE, 2024, 19 (02):
  • [5] Clone detection in source code by frequent itemset techniques
    Wahler, V
    Seipel, D
    Von Gudenberg, JW
    Fischer, G
    [J]. FOURTH IEEE INTERNATIONAL WORKSHOP ON SOURCE CODE ANALYSIS AND MANIPULATION, PROCEEDINGS, 2004, : 128 - 135
  • [6] DroidCC: A Scalable Clone Detection Approach for Android Applications to Detect Similarity at Source Code Level
    Akram, Junaid
    Shi, Zhendong
    Mumtaz, Majid
    Ping, Luo
    [J]. 2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2018, : 100 - 105
  • [7] Android Source Code Vulnerability Detection: A Systematic Literature Review
    Senanayake, Janaka
    Kalutarage, Harsha
    Al-Kadri, Mhd Omar
    Petrovski, Andrei
    Piras, Luca
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (09)
  • [8] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    [J]. ProQuest Dissertations and Theses Global, 2022,
  • [9] Source-code Similarity Detection and Detection Tools Used in Academia: A Systematic Review
    Novak, Matija
    Joy, Mike
    Kermek, Dragutin
    [J]. ACM TRANSACTIONS ON COMPUTING EDUCATION, 2019, 19 (03)
  • [10] Challenges in Behavioral Code Clone Detection
    Su, Fang-Hsiang
    Bell, Jonathan
    Kaiser, Gail
    [J]. 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 3, 2016, : 21 - 22