Detecting Semantic Code Clones by Building AST-based Markov Chains Model

被引:12
|
作者
Wu, Yueming [1 ,2 ,3 ]
Feng, Siyue [1 ,2 ,3 ]
Zou, Deqing [1 ,2 ,3 ]
Jin, Hai [1 ,3 ,4 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] HUST, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Peoples R China
[3] HUST, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Wuhan 430074, Peoples R China
[4] HUST, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
基金
美国国家科学基金会;
关键词
Semantic Code Clones; Abstract Syntax Tree; Markov Chain; NEURAL-NETWORK;
D O I
10.1145/3551349.3560426
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design Amain, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the original complex tree into simple Markov chains and measure the distance of all states in these chains. After obtaining all distance values, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of Amain, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that Amain is superior to nine state-of-the-art code clone detection tools (i.e., SourcererCC, RtvNN, Deckard, ASTNN, TBCNN, CDLH, FCCA, DeepSim, and SCDetector).
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning
    Yu, Hao
    Hu, Xing
    Li, Ge
    Li, Ying
    Wang, Qianxiang
    Xie, Tao
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (04)
  • [12] DETECTING LOCAL SEMANTIC CONCEPTS IN ENVIRONMENTAL SOUNDS USING MARKOV MODEL BASED CLUSTERING
    Lee, Keansub
    Ellis, Daniel P. W.
    Loui, Alexander C.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2278 - 2281
  • [13] Detecting Java']Java Code Clones Based on Bytecode Sequence Alignment
    Yu, Dongjin
    Yang, Jiazha
    Chen, Xin
    Chen, Jie
    IEEE ACCESS, 2019, 7 : 22421 - 22433
  • [14] MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version)
    Onwuzurike, Lucky
    Mariconti, Enrico
    Andriotis, Panagiotis
    De Cristofaro, Emiliano
    Ross, Gordon
    Stringhini, Gianluca
    ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2019, 22 (02)
  • [15] Detecting malicious Java']JavaScript code based on semantic analysis
    Fang, Yong
    Huang, Cheng
    Su, Yu
    Qiu, Yaoyao
    COMPUTERS & SECURITY, 2020, 93
  • [16] A Retinex model based on Absorbing Markov Chains
    Gianini, Gabriele
    Rizzi, Alessandro
    Damiani, Ernesto
    INFORMATION SCIENCES, 2016, 327 : 149 - 174
  • [17] Classification model for code clones based on machine learning
    Jiachen Yang
    Keisuke Hotta
    Yoshiki Higo
    Hiroshi Igaki
    Shinji Kusumoto
    Empirical Software Engineering, 2015, 20 : 1095 - 1125
  • [18] Neural Detection of Semantic Code Clones via Tree-Based Convolution
    Yu, Hao
    Lam, Wing
    Chen, Long
    Li, Ge
    Xie, Tao
    Wang, Qianxiang
    2019 IEEE/ACM 27TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2019), 2019, : 70 - 80
  • [19] Classification model for code clones based on machine learning
    Yang, Jiachen
    Hotta, Keisuke
    Higo, Yoshiki
    Igaki, Hiroshi
    Kusumoto, Shinji
    EMPIRICAL SOFTWARE ENGINEERING, 2015, 20 (04) : 1095 - 1125
  • [20] Detecting Java']Java Code Clones with Multi-Granularities Based on Bytecode
    Yu, Dongjin
    Wang, Jie
    Wu, Qing
    Yang, Jiazha
    Wang, Jiaojiao
    Yang, Wei
    Yan, Wei
    2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2017, : 317 - 326