Graph-based code semantics learning for efficient semantic code clone detection

被引:6
|
作者
Yu, Dongjin [1 ]
Yang, Quanxin [1 ]
Chen, Xin [1 ]
Chen, Jie [1 ]
Xu, Yihang [1 ]
机构
[1] Hangzhou Dianzi Univ, Coll Comp Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Code clone detection; Code semantics learning; Graph matching network; Code graph representation; SEARCH;
D O I
10.1016/j.infsof.2022.107130
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent studies have shown that high-quality code semantics learning can effectively improve the performance of code clone detection. However, existing approaches suffer from two major drawbacks: (a) insufficient utilization of code representations, leading to inefficient semantics learning, and (b) low efficiency of clone detection, resulting in massive detection time. Therefore, we are motivated to propose an efficient semantics learning method while speeding up the detection process. Specifically, to address the first one, we adopt either CFG (Control Flow Graph) or PDG (Program Dependency Graph) as our initial code representation because of their rich semantic information. Further, we propose a novel graph-based code semantics learning method, which can capture critical information at token, statement, edge, and graph levels. To address the second one, we design a Siamese graph-matching network based on attention mechanisms. It can uniformly generate graph embeddings for code fragments and facilitate parallel detection of semantic clones, thus significantly boosting the speed of semantic clone detection.We evaluated our approach on two Java benchmark datasets, Google Code Jam and BigCloneBench. The experimental results show that our model outperforms the SOTA (State-Of-The-Art) lightweight models and is over 20x faster in detection. In addition, our model performs on par with the large Bert-based models and is over 110x faster in detection. Our code and dataset are available online at: https://github.com/HduDBSI/ CodeGraph4CCDetector.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics
    Wang, Shu
    Wang, Xinda
    Sun, Kun
    Jajodia, Sushil
    Wang, Haining
    Li, Qi
    [J]. 2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2409 - 2426
  • [2] Semantic Clone Detection Based on Code Feature Fusion Learning
    Zhang, Qianjin
    Jin, Dahai
    Wang, Yawen
    Gong, Yunzhan
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (07) : 1039 - 1062
  • [3] Graph-of-Code: Semantic Clone Detection Using Graph Fingerprints
    Alhazami, Essa A.
    Sheneamer, Abdullah M.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (08) : 3972 - 3988
  • [4] Semantic Code Clone Detection Based on Community Detection
    Wan, Zexuan
    Xie, Chunli
    Lv, Quanrun
    Fan, Yasheng
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, : 1661 - 1692
  • [5] Low-Complexity Code Clone Detection using Graph-based Neural Networks
    Liu, Hu
    Zhao, Hui
    Han, Changhao
    Hou, Lu
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 797 - 802
  • [6] Predicting Change Propagation between Code Clone Instances by Graph-based Deep Learning
    Hu, Bin
    Wu, Yijian
    Peng, Xin
    Sha, Chaofeng
    Wang, Xiaochen
    Fu, Baiqiang
    Zhao, Wenyun
    [J]. 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 425 - 436
  • [7] CogCol: Code Graph-Based Contrastive Learning Model for Code Summarization
    Shi, Yucen
    Yin, Ying
    Yu, Mingqian
    Chu, Liangyu
    [J]. ELECTRONICS, 2024, 13 (10)
  • [8] TreeCen: Building Tree Graph for Scalable Semantic Code Clone Detection
    Hu, Yutao
    Zou, Deqing
    Peng, Junru
    Wu, Yueming
    Shan, Junjie
    Jin, Hai
    [J]. PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [9] Java']Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph
    Yuan, Dawei
    Fang, Sen
    Zhang, Tao
    Xu, Zhou
    Luo, Xiapu
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (02) : 511 - 526
  • [10] Combining Graph-Based Learning with Automated Data Collection for Code Vulnerability Detection
    Wang, Huanting
    Ye, Guixin
    Tang, Zhanyong
    Tan, Shin Hwei
    Huang, Songfang
    Fang, Dingyi
    Feng, Yansong
    Bian, Lizhong
    Wang, Zheng
    [J]. IEEE Transactions on Information Forensics and Security, 2021, 16 : 1943 - 1958