Detecting Semantic Code Clones by Building AST-based Markov Chains Model

被引:12
|
作者
Wu, Yueming [1 ,2 ,3 ]
Feng, Siyue [1 ,2 ,3 ]
Zou, Deqing [1 ,2 ,3 ]
Jin, Hai [1 ,3 ,4 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] HUST, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Peoples R China
[3] HUST, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Wuhan 430074, Peoples R China
[4] HUST, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
基金
美国国家科学基金会;
关键词
Semantic Code Clones; Abstract Syntax Tree; Markov Chain; NEURAL-NETWORK;
D O I
10.1145/3551349.3560426
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design Amain, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the original complex tree into simple Markov chains and measure the distance of all states in these chains. After obtaining all distance values, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of Amain, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that Amain is superior to nine state-of-the-art code clone detection tools (i.e., SourcererCC, RtvNN, Deckard, ASTNN, TBCNN, CDLH, FCCA, DeepSim, and SCDetector).
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Predicting the Semantic Related words based on Hidden Markov Model
    Yang, Fuping
    Gu, Huafeng
    Proceedings of the 2016 4th International Conference on Machinery, Materials and Information Technology Applications, 2016, 71 : 865 - 871
  • [32] A Model of Electric Vehicle Recharge Stations based on Cyclic Markov Chains
    Gruosso, Giambattista
    Gajani, Giancarlo Storti
    45TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2019), 2019, : 2586 - 2591
  • [33] Markov-based optimization model for building facilities management
    Zhang, Xueqing
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT-ASCE, 2006, 132 (11): : 1203 - 1211
  • [34] A Study on the Markov Chain Based Malicious Code Threat Estimation Model
    Kuinam J. Kim
    JongMin Kim
    Wireless Personal Communications, 2017, 94 : 315 - 329
  • [35] A Study on the Markov Chain Based Malicious Code Threat Estimation Model
    Kim, Kuinam J.
    Kim, JongMin
    WIRELESS PERSONAL COMMUNICATIONS, 2017, 94 (03) : 315 - 329
  • [36] Detecting Strategy of Fast Flux Domain Based on Hidden Markov Model
    Huang, Ren-De
    Kuo, Shu-Yu
    Chou, Yao-Hsin
    JOURNAL OF INTERNET TECHNOLOGY, 2015, 16 (02): : 277 - 287
  • [37] Semantic indexing of soccer audio-visual sequences: A multimodal approach based on controlled Markov chains
    Leonardi, R
    Migliorati, P
    Prandini, M
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2004, 14 (05) : 634 - 643
  • [38] Study of Factors Influence on the Variability of Time for Consensus Building in Coalitions Based on Regular Markov Chains
    Maksimova, Olga, V
    Aronov, Iosif Z.
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2021, 6 (04) : 1076 - 1088
  • [39] Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model
    Huang X.
    Ma T.
    Wang G.
    Data Analysis and Knowledge Discovery, 2023, 7 (05) : 81 - 91
  • [40] A computer virus detecting model based on artificial immune and key code
    Li, Zhang
    Bin, Xie
    Fang, Lou
    Qiang, He Zhi
    Xin, Dong Zhi
    Open Cybernetics and Systemics Journal, 2014, 8 (01): : 448 - 454