Indexing source code and clone detection

被引:1
|
作者
Tronicek, Zdenek [1 ]
机构
[1] Tarleton State Univ, Coll Sci & Technol, Stephenville, TX 76401 USA
关键词
Clone detection; Code clones; Indexing; Tree pattern matching; ACCURATE; SEARCH;
D O I
10.1016/j.infsof.2021.106805
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Searching source code is a common task in code recommendation systems as well as in many other areas. Clone detection is used in software maintenance and bug detection. Objective: The paper introduces an algorithm for building the index structure of abstract syntax trees. When the index structure is built, a pattern tree can be found in time linear in the length of the pattern. Furthermore, the paper describes DrDup2 and DrDupLex, two open-source tools that use the index structure to find Type-2 clones. Method: The index structure presented in this paper is based on the trie, which is a fundamental data structure in computer science. Evaluation of the presented clone detectors is done on BigCloneBench, which is a well-established benchmark for clone detection. Results: Comparison with three state-of-the-art clone detectors (NiCad, CloneWorks and SourcererCC) shows that DrDup2 and DrDupLex are able to beat them in precision, recall and running time. Conclusion: The presented index structure can be used for example to speed up searching for code fragments in code recommendation systems. It is also shown that it can be used to detect Type-2 clones with high precision and recall.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] A Novel Code Stylometry-based Code Clone Detection Strategy
    Dong, Wenyuan
    Feng, Zhiyong
    Wei, Hua
    Luo, Hong
    [J]. 2020 16TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC, 2020, : 1516 - 1521
  • [42] Generic Code Cloning method for Detection of Clone Code in Software Development
    Haque, Syed Mohd Fazalul
    Srikanth, V.
    Reddy, E. Sreenivasa
    [J]. PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA MINING AND ADVANCED COMPUTING (SAPIENCE), 2016, : 340 - 344
  • [43] Open-Source Tools and Benchmarks for Code-Clone Detection: Past, Present, and Future Trends
    Walker, Andrew
    Cerny, Tomas
    Song, Eungee
    [J]. APPLIED COMPUTING REVIEW, 2019, 19 (04): : 28 - 39
  • [44] Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code
    Wei, Hui-Hui
    Li, Ming
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3034 - 3040
  • [45] Supporting program indexing and querying in source code digital libraries
    School of Computer Science, Cardiff University, United Kingdom
    [J]. 1600, 275-290 (2006):
  • [46] Supporting program indexing and querying in source code digital libraries
    Yusof, Yuhanis
    Rana, Omer F.
    [J]. AGENT-ORIENTED INFORMATION SYSTEMS III, 2006, 3529 : 275 - +
  • [47] Supporting Program Understanding by Automatic Indexing of Functionalities in Source Code
    Nishimoto, Masashi
    Nishiyama, Keiji
    Kawabata, Hideyuki
    Hironaka, Tetsuo
    [J]. 2019 IEEE/ACIS 17TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATIONS (SERA), 2019, : 13 - 18
  • [48] Development of a code clone search tool for open source repositories
    Xia, Pei
    Manabe, Yuki
    Yoshida, Norihiro
    Inoue, Katsuro
    [J]. Computer Software, 2012, 29 (03): : 181 - 187
  • [49] CONCORD: Clone-Aware Contrastive Learning for Source Code
    Ding, Yangruibo
    Chakraborty, Saikat
    Buratti, Luca
    Pujar, Saurabh
    Morari, Alessandro
    Kaiser, Gail
    Ray, Baishakhi
    [J]. PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 26 - 38
  • [50] Clone Region Descriptors: Representing and Tracking Duplication in Source Code
    Duala-Ekoko, Ekwa
    Robillard, Martin P.
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2010, 20 (01)