CCLearner: A Deep Learning-Based Clone Detection Approach

被引:134
|
作者
Li, Liuqing [1 ]
Feng, He [1 ]
Zhuang, Wenjie [1 ]
Meng, Na [1 ]
Ryder, Barbara [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
基金
美国国家科学基金会;
关键词
deep learning; clone detection; empirical;
D O I
10.1109/ICSME.2017.46
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Programmers produce code clones when developing software. By copying and pasting code with or without modification, developers reuse existing code to improve programming productivity. However, code clones present challenges to software maintenance: they may require consistent application of the same or similar bug fixes or program changes to multiple code locations. To simplify the maintenance process, various tools have been proposed to automatically detect clones [1], [2], [3], [4], [5], [6]. Some tools tokenize source code, and then compare the sequence or frequency of tokens to reveal clones [1], [3], [4], [5]. Some other tools detect clones using tree-matching algorithms to compare the Abstract Syntax Trees (ASTs) of source code [2], [6]. In this paper, we present CCLEARNER, the first solely token-based clone detection approach leveraging deep learning. CCLEARNER extracts tokens from known method-level code clones and nonclones to train a classifier, and then uses the classifier to detect clones in a given codebase. To evaluate CCLEARNER, we reused BigCloneBench [7], an existing large benchmark of real clones. We used part of the benchmark for training and the other part for testing, and observed that CCLEARNER effectively detected clones. With the same data set, we conducted the first systematic comparison experiment between CCLEARNER and three popular clone detection tools. Compared with the approaches not using deep learning, CCLEARNER achieved competitive clone detection effectiveness with low time cost.
引用
收藏
页码:249 / 259
页数:11
相关论文
共 50 条
  • [1] A parallel deep learning-based code clone detection model
    Zhang, Xiangping
    Liu, Jianxun
    Shi, Min
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 181
  • [2] MultiResEdge: A deep learning-based edge detection approach
    Muntarina, Kanija
    Mostafiz, Rafid
    Khanom, Fahmida
    Shorif, Sumaita Binte
    Uddin, Mohammad Shorif
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 20
  • [3] A novel deep learning-based approach for malware detection
    Shaukat, Kamran
    Luo, Suhuai
    Varadharajan, Vijay
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [4] Investigating the Generalizability of Deep Learning-based Clone Detectors
    Choi, Eunjong
    Fuke, Norihiro
    Fujiwara, Yuji
    Yoshida, Norihiro
    Inoue, Katsuro
    [J]. 2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 181 - 185
  • [5] A Deep Learning-Based Approach for Road Surface Damage Detection
    Kulambayev, Bakhytzhan
    Beissenova, Gulbakhram
    Katayev, Nazbek
    Abduraimova, Bayan
    Zhaidakbayeva, Lyazzat
    Sarbassova, Alua
    Akhmetova, Oxana
    Issayev, Sapar
    Suleimenova, Laura
    Kasenov, Syrym
    Shadinova, Kunsulu
    Shyrakbayev, Abay
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (02): : 3403 - 3418
  • [6] Deep transfer learning-based approach for detection of cracks on eggs
    Botta, Bhavya
    Datta, Ashis Kumar
    [J]. JOURNAL OF FOOD PROCESS ENGINEERING, 2023, 46 (11)
  • [7] Deep Learning-Based Community Detection Approach on Bitcoin Network
    Essaid, Meryam
    Ju, Hongteak
    [J]. SYSTEMS, 2022, 10 (06):
  • [8] A Deep Learning-Based Approach for the Detection of Infested Soybean Leaves
    Farah, Niklas
    Drack, Nicolas
    Dawel, Hannah
    Buettner, Ricardo
    [J]. IEEE ACCESS, 2023, 11 : 99670 - 99679
  • [9] A Deep Learning-based Approach for Vision-based Weeds Detection
    Wang, Yan
    [J]. International Journal of Advanced Computer Science and Applications, 2023, 14 (12): : 75 - 81
  • [10] A Deep Learning-based Approach for Vision-based Weeds Detection
    Wang, Yan
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (12) : 75 - 82