Pre-training Code Representation with Semantic Flow Graph for Effective Bug Localization

被引:0
|
作者
Du, Yali [1 ]
Yu, Zhongxing [1 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
基金
中国国家自然科学基金;
关键词
bug localization; semantic flow graph; type; computation role; pre-trained model; contrastive learning;
D O I
10.1145/3611643.3616338
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Enlightened by the big success of pre-training in natural language processing, pre-trained models for programming languages have been widely used to promote code intelligence in recent years. In particular, BERT has been used for bug localization tasks and impressive results have been obtained. However, these BERT-based bug localization techniques suffer from two issues. First, the pre-trained BERT model on source code does not adequately capture the deep semantics of program code. Second, the overall bug localization models neglect the necessity of large-scale negative samples in contrastive learning for representations of changesets and ignore the lexical similarity between bug reports and changesets during similarity estimation. We address these two issues by 1) proposing a novel directed, multiple-label code graph representation named Semantic Flow Graph (SFG), which compactly and adequately captures code semantics, 2) designing and training SemanticCodeBERT based on SFG, and 3) designing a novel Hierarchical Momentum Contrastive Bug Localization technique (HMCBL). Evaluation results show that our method achieves state-of-the-art performance in bug localization.
引用
收藏
页码:579 / 591
页数:13
相关论文
共 50 条
  • [1] Clone Detection with Pre-training Enhanced Code Representation
    Leng, Lin-Shan
    Liu, Shuang
    Tian, Cheng-Lin
    Dou, Shu-Jie
    Wang, Zan
    Zhang, Mei-Shan
    [J]. Ruan Jian Xue Bao/Journal of Software, 2022, 33 (05): : 1758 - 1773
  • [2] VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
    Chen, Qibin
    Lacomis, Jeremy
    Schwartz, Edward J.
    Neubig, Graham
    Vasilescu, Bogdan
    Le Goues, Claire
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2327 - 2339
  • [3] Improving Knowledge Graph Representation Learning by Structure Contextual Pre-training
    Ye, Ganqiang
    Zhang, Wen
    Bi, Zhen
    Wong, Chi Man
    Chen, Hui
    Chen, Huajun
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS (IJCKG 2021), 2021, : 151 - 155
  • [4] UniXcoder: Unified Cross-Modal Pre-training for Code Representation
    Guo, Daya
    Lu, Shuai
    Duan, Nan
    Wang, Yanlin
    Zhou, Ming
    Yin, Jian
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7212 - 7225
  • [5] GROWN plus UP: A "Graph Representation Of a Webpage" Network Utilizing Pre-training
    Yeoh, Benedict
    Wang, Huijuan
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2372 - 2382
  • [6] Code-aware fault localization with pre-training and interpretable machine learning
    Zhang, Zhuo
    Li, Ya
    Yang, Sha
    Zhang, Zhanjun
    Lei, Yan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [7] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [8] Improving fault localization with pre-training
    Zhang, Zhuo
    Li, Ya
    Xue, Jianxin
    Mao, Xiaoguang
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (01)
  • [9] On Domain-Specific Pre-Training for Effective Semantic Perception in Agricultural Robotics
    Roggiolani, Gianmarco
    Magistri, Federico
    Guadagnino, Tiziano
    Weyler, Jan
    Grisetti, Giorgio
    Stachniss, Cyrill
    Behley, Jens
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 11786 - 11793
  • [10] Improving fault localization with pre-training
    Zhuo Zhang
    Ya Li
    Jianxin Xue
    Xiaoguang Mao
    [J]. Frontiers of Computer Science, 2024, 18