Code-aware fault localization with pre-training and interpretable machine learning

被引：1

作者：

Zhang, Zhuo ^{[1
]}

Li, Ya ^{[2
]}

Yang, Sha ^{[1
]}

Zhang, Zhanjun ^{[3
]}

Lei, Yan ^{[4
]}

机构：

[1] Guangzhou Coll Commerce, Sch Informat Technol & Engn, Guangzhou, Peoples R China

[2] Shanghai Jiao Tong Univ, Ningbo Artificial Intelligence Inst, Ningbo, Peoples R China

[3] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China

[4] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 238卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Fault localization; Pre-training; Interpretable machine learning;

D O I：

10.1016/j.eswa.2023.121689

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Following the rapid development of deep learning, many studies in the field of fault localization (FL) have utilized deep learning to analyze statements' coverage information (i.e., executed or not executed) and test cases' results (i.e., failing or passing), which have shown dramatic ability in identifying suspicious statements potentially responsible for failures. However, they mainly pay attention to the binary information of executing test cases but ignore incorporating code snippets and their inner relationships into the learning process. Furthermore, how a complex deep learning model for FL achieves a particular decision is not transparent. These drawbacks may limit the effectiveness of FL. Recently, graph-based pre-training techniques have dramatically improved the state-of-the-art in a variety of code-related tasks such as natural language code search, clone detection, code translation, code refinement, etc. And interpretable machine learning tackles the problem of non-transparency and enables learning models to explain or present their behaviors to humans in an understandable way.In this paper, our insight is to select a candidate solution that leverages the promising learning ability of graph-based pre-training techniques to learn a feasible model for incorporating code snippets as well as their inner relationships into fault localization, and then uses interpretable machine learning to localize faulty statements. Thus, we propose CodeAwareFL, a code-aware fault localization technique with pre-training and interpretable machine learning. Concretely, CodeAwareFL constructs a variety of code snippets through executing test cases. Next, CodeAwareFL utilizes the code snippets to extract propagation chains which could show a set of variables interact with each other to cause a failure. After that, a graph-based pre-trained model is customized for fault localization. CodeAwareFL takes the code snippets and their corresponding propagation chains as inputs with test results as labels to conduct the training process. Finally, CodeAwareFL evaluates the suspiciousness of statements with interpretable machine learning techniques. In the experimental study, we choose 12 large-sized programs to conduct the comparison. The results show that CodeAwareFL achieves promising results (e.g., 32.43% faults are ranked within top 5), and is significantly better than 12 state-of-the-art baselines.

引用

页数：13

共 50 条

[1] Improving fault localization with pre-training
Zhang, Zhuo
Li, Ya
Xue, Jianxin
Mao, Xiaoguang
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (01)
[2] Code-Aware Storage Channel Modeling via Machine Learning
Zheng, Simeng
Siegel, Paul H.
2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 196 - 201
[3] CURE: Code-Aware Neural Machine Translation for Automatic Program Repair
Jiang, Nan
Lutellier, Thibaud
Tan, Lin
2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, : 1161 - 1173
[4] CSP: Code-Switching Pre-training for Neural Machine Translation
Yang, Zhen
Hu, Bojie
Han, Ambyera
Huang, Shen
Ju, Qi
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2624 - 2636
[5] Pre-training Code Representation with Semantic Flow Graph for Effective Bug Localization
Du, Yali
Yu, Zhongxing
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 579 - 591
[6] Matrix product state pre-training for quantum machine learning
Dborin, James
Barratt, Fergus
Wimalaweera, Vinul
Wright, Lewis
Green, Andrew G.
QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
[7] Pre-Training for Mathematics-Aware Retrieval
Reusch, Anja
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3496 - 3496
[8] Contrastive Code-Comment Pre-training
Pei, Xiaohuan
Liu, Daochang
Qian, Luo
Xu, Chang
2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 398 - 407
[9] SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations
Niu, Changan
Li, Chuanyi
Ng, Vincent
Ge, Jidong
Huang, Liguo
Luo, Bin
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2006 - 2018
[10] Denoising Pre-training for Machine Translation Quality Estimation with Curriculum Learning
Geng, Xiang
Zhang, Yu
Li, Jiahuan
Huang, Shujian
Yang, Hao
Tao, Shimin
Chen, Yimeng
Xie, Ning
Chen, Jiajun
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12827 - 12835

← 1 2 3 4 5 →