Detecting Semantic Code Clones by Building AST-based Markov Chains Model

被引:12
|
作者
Wu, Yueming [1 ,2 ,3 ]
Feng, Siyue [1 ,2 ,3 ]
Zou, Deqing [1 ,2 ,3 ]
Jin, Hai [1 ,3 ,4 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] HUST, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Peoples R China
[3] HUST, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Wuhan 430074, Peoples R China
[4] HUST, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
基金
美国国家科学基金会;
关键词
Semantic Code Clones; Abstract Syntax Tree; Markov Chain; NEURAL-NETWORK;
D O I
10.1145/3551349.3560426
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design Amain, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the original complex tree into simple Markov chains and measure the distance of all states in these chains. After obtaining all distance values, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of Amain, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that Amain is superior to nine state-of-the-art code clone detection tools (i.e., SourcererCC, RtvNN, Deckard, ASTNN, TBCNN, CDLH, FCCA, DeepSim, and SCDetector).
引用
收藏
页数:13
相关论文
共 50 条
  • [21] ASKDetector: An AST-Semantic and Key Features Fusion based Code Comment Mismatch Detector
    Yang, Haiyang
    Chen, Hao
    Kuai, Zhirui
    Tu, Shuyuan
    Kuang, Li
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 392 - 402
  • [22] ASTENS-BWA: Searching partial syntactic similar regions between source code fragments via AST-based encoded sequence alignment
    Yu, Yaoshen
    Huang, Zhiqiu
    Shen, Guohua
    Li, Weiwei
    Shao, Yichao
    SCIENCE OF COMPUTER PROGRAMMING, 2022, 222
  • [23] PROBABILISTIC MODEL OF LANDSLIDE PROCESSES BASED ON MARKOV CHAINS
    Victorov, Alexey
    SCIENCE AND TECHNOLOGIES IN GEOLOGY, EXPLORATION AND MINING, SGEM 2015, VOL II, 2015, : 579 - 586
  • [24] Research on estimating effect model based on Markov chains
    Li, Qing-Min
    Wang, Hong-Wei
    Li, Hua
    Liu, Jun
    Wuhan Ligong Daxue Xuebao/Journal of Wuhan University of Technology, 2008, 30 (01): : 120 - 122
  • [25] A Model of Music Perceptual Theory Based on Markov Chains
    Wen, Ru
    Chen, Kai
    Zhang, Yilin
    Huang, Wenmin
    Tian, Jiyuan
    Xu, Kuan
    Wu, Jiang
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 1099 - 1105
  • [26] An Approach of Diagnosis Based On The Hidden Markov Chains Model
    Bouamrane, Karim
    Djebbar, Amel
    Atmani, Baghdad
    COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2008, 16 (02) : 256 - 268
  • [27] A Risk Model Based on Markov Chains with Marked Transitions
    Ren, Jiandong
    STOCHASTIC MODELS, 2013, 29 (02) : 258 - 272
  • [28] Goner: Building Tree-Based N-Gram-Like Model for Semantic Code Clone Detection
    Wu, Yueming
    Feng, Siyue
    Suo, Wenqi
    Zou, Deqing
    Jin, Hai
    IEEE TRANSACTIONS ON RELIABILITY, 2024, 73 (02) : 1310 - 1324
  • [29] MODEL OF FORMATION OF LINEAR POLYMERS, BASED ON THEORY OF MARKOV CHAINS
    GAVRILET.VN
    STRELTSO.AA
    ZAVODSKAYA LABORATORIYA, 1973, (03): : 323 - 326
  • [30] A Heuristic Model for Spare Parts Stocking Based on Markov Chains
    Pacheco-Velazquez, Ernesto Armando
    Robles-Cardenas, Manuel
    Ordonez, Saul Juarez
    Solis, Abelardo Ernesto Damy
    Cardenas-Barron, Leopoldo Eduardo
    MATHEMATICS, 2023, 11 (16)