Detecting Semantic Code Clones by Building AST-based Markov Chains Model

被引:12
|
作者
Wu, Yueming [1 ,2 ,3 ]
Feng, Siyue [1 ,2 ,3 ]
Zou, Deqing [1 ,2 ,3 ]
Jin, Hai [1 ,3 ,4 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] HUST, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Peoples R China
[3] HUST, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Wuhan 430074, Peoples R China
[4] HUST, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
基金
美国国家科学基金会;
关键词
Semantic Code Clones; Abstract Syntax Tree; Markov Chain; NEURAL-NETWORK;
D O I
10.1145/3551349.3560426
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design Amain, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the original complex tree into simple Markov chains and measure the distance of all states in these chains. After obtaining all distance values, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of Amain, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that Amain is superior to nine state-of-the-art code clone detection tools (i.e., SourcererCC, RtvNN, Deckard, ASTNN, TBCNN, CDLH, FCCA, DeepSim, and SCDetector).
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Detecting Semantic Code Clones by Building AST-based Markov Chains Model
    Wu, Yueming
    Feng, Siyue
    Zou, Deqing
    Jin, Hai
    ACM International Conference Proceeding Series, 2022,
  • [2] Tritor: Detecting Semantic Code Clones by Building Social Network-Based Triads Model
    Zou, Deqing
    Feng, Siyue
    Wu, Yueming
    Suo, Wenqi
    Jin, Hai
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 771 - 783
  • [3] An AST-Based Code Plagiarism Detection Algorithm
    Zhao, Jingling
    Xia, Kunfeng
    Fu, Yilun
    Cui, Baojiang
    2015 10TH INTERNATIONAL CONFERENCE ON BROADBAND AND WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2015), 2015, : 178 - 182
  • [4] AST-Based Deep Learning for Detecting Malicious PowerShell
    Rusak, Gili
    Al-Dujaili, Abdullah
    O'Reilly, Una-May
    PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, : 2276 - 2278
  • [5] AST-Based Source Code Migration Through Symbols Replacement
    Chen, Dawei
    Lawler, Duncan
    Zhang, Yi
    Kuchhal, Pramil
    Westcott, Derek
    Yang, Guanxiong
    Proceedings of IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2022, 2022,
  • [6] Enhancing structural knowledge in code smell identification: A fusion learning framework combining AST-based metrics with semantic embeddings
    Yang, Quanxin
    Yu, Dongjin
    Wang, Sixuan
    Xu, Yihang
    Chen, Xin
    Chen, Jie
    Hu, Bin
    Expert Systems with Applications, 2025, 263
  • [7] Automatic Identification of Vulnerable Code: Investigations with an AST-Based Neural Network
    Partenza, Garrett
    Amburgey, Trevor
    Deng, Lin
    Dehlinger, Josh
    Chakraborty, Suranjan
    2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 1475 - 1482
  • [8] A AST AND CONTEXT BASED DUPLICATED CODE DETECTING METHOD
    Liu, Wei
    Liu, Chuanchang
    Gong, Yunzhan
    Chen, Junliang
    CIICT 2008: PROCEEDINGS OF CHINA-IRELAND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATIONS TECHNOLOGIES 2008, 2008, : 1 - 5
  • [9] MAMADROID: Detecting Android Malware by Building Markov Chains of Behavioral Models
    Mariconti, Enrico
    Onwuzurike, Lucky
    Andriotis, Panagiotis
    De Cristofaro, Emiliano
    Ross, Gordon
    Stringhini, Gianluca
    24TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2017), 2017,
  • [10] Detecting Xebsite Vulnerabilities based on Markov Chains Theory
    Yassine, Ayachi
    Noureddine, Rahmoune
    Ettifouri, El Hassane
    Berrich, Jamal
    Toumi, Bouchentouf
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2016, : 697 - 700