Graph-based code semantics learning for efficient semantic code clone detection

被引：6

作者：

Yu, Dongjin ^{[1
]}

Yang, Quanxin ^{[1
]}

Chen, Xin ^{[1
]}

Chen, Jie ^{[1
]}

Xu, Yihang ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Coll Comp Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2023年 / 156卷

基金：

中国国家自然科学基金;

关键词：

Code clone detection; Code semantics learning; Graph matching network; Code graph representation; SEARCH;

D O I：

10.1016/j.infsof.2022.107130

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent studies have shown that high-quality code semantics learning can effectively improve the performance of code clone detection. However, existing approaches suffer from two major drawbacks: (a) insufficient utilization of code representations, leading to inefficient semantics learning, and (b) low efficiency of clone detection, resulting in massive detection time. Therefore, we are motivated to propose an efficient semantics learning method while speeding up the detection process. Specifically, to address the first one, we adopt either CFG (Control Flow Graph) or PDG (Program Dependency Graph) as our initial code representation because of their rich semantic information. Further, we propose a novel graph-based code semantics learning method, which can capture critical information at token, statement, edge, and graph levels. To address the second one, we design a Siamese graph-matching network based on attention mechanisms. It can uniformly generate graph embeddings for code fragments and facilitate parallel detection of semantic clones, thus significantly boosting the speed of semantic clone detection.We evaluated our approach on two Java benchmark datasets, Google Code Jam and BigCloneBench. The experimental results show that our model outperforms the SOTA (State-Of-The-Art) lightweight models and is over 20x faster in detection. In addition, our model performs on par with the large Bert-based models and is over 110x faster in detection. Our code and dataset are available online at: https://github.com/HduDBSI/ CodeGraph4CCDetector.

引用

页数：13

共 50 条

[1] GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics
Wang, Shu
Wang, Xinda
Sun, Kun
Jajodia, Sushil
Wang, Haining
Li, Qi
[J]. 2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2409 - 2426
[2] Semantic Clone Detection Based on Code Feature Fusion Learning
Zhang, Qianjin
Jin, Dahai
Wang, Yawen
Gong, Yunzhan
[J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (07) : 1039 - 1062
[3] Graph-of-Code: Semantic Clone Detection Using Graph Fingerprints
Alhazami, Essa A.
Sheneamer, Abdullah M.
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (08) : 3972 - 3988
[4] Semantic Code Clone Detection Based on Community Detection
Wan, Zexuan
Xie, Chunli
Lv, Quanrun
Fan, Yasheng
[J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, : 1661 - 1692
[5] Low-Complexity Code Clone Detection using Graph-based Neural Networks
Liu, Hu
Zhao, Hui
Han, Changhao
Hou, Lu
[J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 797 - 802
[6] Predicting Change Propagation between Code Clone Instances by Graph-based Deep Learning
Hu, Bin
Wu, Yijian
Peng, Xin
Sha, Chaofeng
Wang, Xiaochen
Fu, Baiqiang
Zhao, Wenyun
[J]. 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 425 - 436
[7] CogCol: Code Graph-Based Contrastive Learning Model for Code Summarization
Shi, Yucen
Yin, Ying
Yu, Mingqian
Chu, Liangyu
[J]. ELECTRONICS, 2024, 13 (10)
[8] TreeCen: Building Tree Graph for Scalable Semantic Code Clone Detection
Hu, Yutao
Zou, Deqing
Peng, Junru
Wu, Yueming
Shan, Junjie
Jin, Hai
[J]. PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
[9] Java']Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph
Yuan, Dawei
Fang, Sen
Zhang, Tao
Xu, Zhou
Luo, Xiapu
[J]. IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (02) : 511 - 526
[10] Combining Graph-Based Learning with Automated Data Collection for Code Vulnerability Detection
Wang, Huanting
Ye, Guixin
Tang, Zhanyong
Tan, Shin Hwei
Huang, Songfang
Fang, Dingyi
Feng, Yansong
Bian, Lizhong
Wang, Zheng
[J]. IEEE Transactions on Information Forensics and Security, 2021, 16 : 1943 - 1958

← 1 2 3 4 5 →