Indexing source code and clone detection

被引:1
|
作者
Tronicek, Zdenek [1 ]
机构
[1] Tarleton State Univ, Coll Sci & Technol, Stephenville, TX 76401 USA
关键词
Clone detection; Code clones; Indexing; Tree pattern matching; ACCURATE; SEARCH;
D O I
10.1016/j.infsof.2021.106805
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Searching source code is a common task in code recommendation systems as well as in many other areas. Clone detection is used in software maintenance and bug detection. Objective: The paper introduces an algorithm for building the index structure of abstract syntax trees. When the index structure is built, a pattern tree can be found in time linear in the length of the pattern. Furthermore, the paper describes DrDup2 and DrDupLex, two open-source tools that use the index structure to find Type-2 clones. Method: The index structure presented in this paper is based on the trie, which is a fundamental data structure in computer science. Evaluation of the presented clone detectors is done on BigCloneBench, which is a well-established benchmark for clone detection. Results: Comparison with three state-of-the-art clone detectors (NiCad, CloneWorks and SourcererCC) shows that DrDup2 and DrDupLex are able to beat them in precision, recall and running time. Conclusion: The presented index structure can be used for example to speed up searching for code fragments in code recommendation systems. It is also shown that it can be used to detect Type-2 clones with high precision and recall.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Gapped Code Clone Detection with Lightweight Source Code Analysis
    Murakami, Hiroaki
    Hotta, Keisuke
    Higo, Yoshiki
    Igaki, Hiroshi
    Kusumoto, Shinji
    [J]. 2013 IEEE 21ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2013, : 93 - 102
  • [2] Clone detection in source code by frequent itemset techniques
    Wahler, V
    Seipel, D
    Von Gudenberg, JW
    Fischer, G
    [J]. FOURTH IEEE INTERNATIONAL WORKSHOP ON SOURCE CODE ANALYSIS AND MANIPULATION, PROCEEDINGS, 2004, : 128 - 135
  • [3] Source Code Clone Detection Using Unsupervised Similarity Measures
    Martinez-Gil, Jorge
    [J]. SOFTWARE QUALITY AS A FOUNDATION FOR SECURITY, SWQD 2024, 2024, 505 : 21 - 37
  • [4] Semantic Clone Detection: Can Source Code Comments Help?
    Ghosh, Akash
    Kuttal, Sandeep Kaur
    [J]. 2018 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2018, : 315 - 317
  • [5] STUBBER: Compiling Source Code into Bytecode without Dependencies for Java']Java Code Clone Detection
    Schafer, Andre
    Amme, Wolfram
    Heinze, Thomas S.
    [J]. 2021 IEEE 15TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES, IWSC 2021, 2021, : 29 - 35
  • [6] Refactoring Code Clone Detection
    Othman, Zhala Sarkawt
    Kaya, Mehmet
    [J]. 2019 7TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2019,
  • [7] Prioritizing Code Clone Detection Results for Clone Management
    Venkatasubramanyam, Radhika D.
    Gupta, Shrinath
    Singh, Himanshu Kumar
    [J]. 2013 7TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2013, : 30 - 36
  • [8] CCFinder: A multilinguistic token-based code clone detection system for large scale source code
    Kamiya, T
    Kusumoto, S
    Inoue, K
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 654 - 670
  • [9] Deep Learning Code Fragments for Code Clone Detection
    White, Martin
    Tufano, Michele
    Vendome, Christopher
    Poshyvanyk, Denys
    [J]. 2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2016, : 87 - 98
  • [10] Generalizability of Code Clone Detection on CodeBERT
    Sonnekalb, Tim
    Gruner, Bernd
    Brust, Clemens-Alexander
    Mäder, Patrick
    [J]. arXiv, 2022,