Detecting Semantic Code Clones by Building AST-based Markov Chains Model

被引:12
|
作者
Wu, Yueming [1 ,2 ,3 ]
Feng, Siyue [1 ,2 ,3 ]
Zou, Deqing [1 ,2 ,3 ]
Jin, Hai [1 ,3 ,4 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] HUST, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Peoples R China
[3] HUST, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Wuhan 430074, Peoples R China
[4] HUST, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
基金
美国国家科学基金会;
关键词
Semantic Code Clones; Abstract Syntax Tree; Markov Chain; NEURAL-NETWORK;
D O I
10.1145/3551349.3560426
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design Amain, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the original complex tree into simple Markov chains and measure the distance of all states in these chains. After obtaining all distance values, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of Amain, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that Amain is superior to nine state-of-the-art code clone detection tools (i.e., SourcererCC, RtvNN, Deckard, ASTNN, TBCNN, CDLH, FCCA, DeepSim, and SCDetector).
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Multiple Similarity-based Features Blending for Detecting Code Clones using Consensus-Driven Classification
    Sheneamer, Abdullah M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [42] Multiple Model-Based Control Using Finite Controlled Markov Chains
    Enso Ikonen
    Kaddour Najim
    Cognitive Computation, 2009, 1 : 234 - 243
  • [43] TWO-LEVEL BALANCE MODEL OF PRODUCTS DISTRIBUTION BASED ON MARKOV CHAINS
    Lapshyn, V., I
    Kuznichenko, V. M.
    Stetsenko, T., V
    FINANCIAL AND CREDIT ACTIVITY-PROBLEMS OF THEORY AND PRACTICE, 2018, 2 (25): : 219 - 225
  • [44] Forecast for Exchange Rate of RMB Based on Markov Chains and Revised AR Model
    Wu, Lihua
    Jiang, Yi
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON PUBLIC ECONOMICS AND MANAGEMENT ICPEM 2009, VOL 2: ECONOMIC POLICIES, PLANNING AND ASSESSMENT, 2009, : 449 - 452
  • [45] Multiple Model-Based Control Using Finite Controlled Markov Chains
    Ikonen, Enso
    Najim, Kaddour
    COGNITIVE COMPUTATION, 2009, 1 (03) : 234 - 243
  • [46] Entity-Relation Extraction Based on Semantic Operators and Markov Chain Model
    Yang Hong
    Chen Rong
    Liu Ya-qing
    2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 4, 2011, : 148 - 151
  • [47] Detecting Chinese IPO Market Cycles Based on Markov Regime Switching Model
    Hu Zhiqiang
    Zhang Yunyun
    Hu Weining
    CONTEMPORARY INNOVATION AND DEVELOPMENT IN STATISTICAL SCIENCE, 2012, : 408 - 417
  • [48] Markov Model-Based Building Deterioration Prediction and ISO Factor Analysis for Building Management
    Edirisinghe, Ruwini
    Setunge, Sujeeva
    Zhang, Guomin
    JOURNAL OF MANAGEMENT IN ENGINEERING, 2015, 31 (06)
  • [49] Model-based furniture recognition for building semantic object maps
    Guenther, Martin
    Wiemann, Thomas
    Albrecht, Sven
    Hertzberg, Joachim
    ARTIFICIAL INTELLIGENCE, 2017, 247 : 336 - 351
  • [50] AI based Semantic Extensibility and Querying Techniques for Building Information Model
    Raghavi, V
    Gowtham, R.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1497 - 1501