Detecting Semantic Code Clones by Building AST-based Markov Chains Model

被引：12

作者：

Wu, Yueming ^{[1
,2
,3
]}

Feng, Siyue ^{[1
,2
,3
]}

Zou, Deqing ^{[1
,2
,3
]}

Jin, Hai ^{[1
,3
,4
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

[2] HUST, Sch Cyber Sci & Engn, Hubei Engn Res Ctr Big Data Secur, Wuhan 430074, Peoples R China

[3] HUST, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Wuhan 430074, Peoples R China

[4] HUST, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China

来源：

PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022 | 2022年

基金：

美国国家科学基金会;

关键词：

Semantic Code Clones; Abstract Syntax Tree; Markov Chain; NEURAL-NETWORK;

D O I：

10.1145/3551349.3560426

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design Amain, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the original complex tree into simple Markov chains and measure the distance of all states in these chains. After obtaining all distance values, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of Amain, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that Amain is superior to nine state-of-the-art code clone detection tools (i.e., SourcererCC, RtvNN, Deckard, ASTNN, TBCNN, CDLH, FCCA, DeepSim, and SCDetector).

引用

页数：13

共 50 条

[1] Detecting Semantic Code Clones by Building AST-based Markov Chains Model
Wu, Yueming
Feng, Siyue
Zou, Deqing
Jin, Hai
ACM International Conference Proceeding Series, 2022,
[2] Tritor: Detecting Semantic Code Clones by Building Social Network-Based Triads Model
Zou, Deqing
Feng, Siyue
Wu, Yueming
Suo, Wenqi
Jin, Hai
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 771 - 783
[3] An AST-Based Code Plagiarism Detection Algorithm
Zhao, Jingling
Xia, Kunfeng
Fu, Yilun
Cui, Baojiang
2015 10TH INTERNATIONAL CONFERENCE ON BROADBAND AND WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2015), 2015, : 178 - 182
[4] AST-Based Deep Learning for Detecting Malicious PowerShell
Rusak, Gili
Al-Dujaili, Abdullah
O'Reilly, Una-May
PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, : 2276 - 2278
[5] AST-Based Source Code Migration Through Symbols Replacement
Chen, Dawei
Lawler, Duncan
Zhang, Yi
Kuchhal, Pramil
Westcott, Derek
Yang, Guanxiong
Proceedings of IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2022, 2022,
[6] Enhancing structural knowledge in code smell identification: A fusion learning framework combining AST-based metrics with semantic embeddings
Yang, Quanxin
Yu, Dongjin
Wang, Sixuan
Xu, Yihang
Chen, Xin
Chen, Jie
Hu, Bin
Expert Systems with Applications, 2025, 263
[7] Automatic Identification of Vulnerable Code: Investigations with an AST-Based Neural Network
Partenza, Garrett
Amburgey, Trevor
Deng, Lin
Dehlinger, Josh
Chakraborty, Suranjan
2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 1475 - 1482
[8] A AST AND CONTEXT BASED DUPLICATED CODE DETECTING METHOD
Liu, Wei
Liu, Chuanchang
Gong, Yunzhan
Chen, Junliang
CIICT 2008: PROCEEDINGS OF CHINA-IRELAND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATIONS TECHNOLOGIES 2008, 2008, : 1 - 5
[9] MAMADROID: Detecting Android Malware by Building Markov Chains of Behavioral Models
Mariconti, Enrico
Onwuzurike, Lucky
Andriotis, Panagiotis
De Cristofaro, Emiliano
Ross, Gordon
Stringhini, Gianluca
24TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2017), 2017,
[10] Detecting Xebsite Vulnerabilities based on Markov Chains Theory
Yassine, Ayachi
Noureddine, Rahmoune
Ettifouri, El Hassane
Berrich, Jamal
Toumi, Bouchentouf
PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2016, : 697 - 700

← 1 2 3 4 5 →