Code-aware fault localization with pre-training and interpretable machine learning

被引:1
|
作者
Zhang, Zhuo [1 ]
Li, Ya [2 ]
Yang, Sha [1 ]
Zhang, Zhanjun [3 ]
Lei, Yan [4 ]
机构
[1] Guangzhou Coll Commerce, Sch Informat Technol & Engn, Guangzhou, Peoples R China
[2] Shanghai Jiao Tong Univ, Ningbo Artificial Intelligence Inst, Ningbo, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[4] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Fault localization; Pre-training; Interpretable machine learning;
D O I
10.1016/j.eswa.2023.121689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Following the rapid development of deep learning, many studies in the field of fault localization (FL) have utilized deep learning to analyze statements' coverage information (i.e., executed or not executed) and test cases' results (i.e., failing or passing), which have shown dramatic ability in identifying suspicious statements potentially responsible for failures. However, they mainly pay attention to the binary information of executing test cases but ignore incorporating code snippets and their inner relationships into the learning process. Furthermore, how a complex deep learning model for FL achieves a particular decision is not transparent. These drawbacks may limit the effectiveness of FL. Recently, graph-based pre-training techniques have dramatically improved the state-of-the-art in a variety of code-related tasks such as natural language code search, clone detection, code translation, code refinement, etc. And interpretable machine learning tackles the problem of non-transparency and enables learning models to explain or present their behaviors to humans in an understandable way.In this paper, our insight is to select a candidate solution that leverages the promising learning ability of graph-based pre-training techniques to learn a feasible model for incorporating code snippets as well as their inner relationships into fault localization, and then uses interpretable machine learning to localize faulty statements. Thus, we propose CodeAwareFL, a code-aware fault localization technique with pre-training and interpretable machine learning. Concretely, CodeAwareFL constructs a variety of code snippets through executing test cases. Next, CodeAwareFL utilizes the code snippets to extract propagation chains which could show a set of variables interact with each other to cause a failure. After that, a graph-based pre-trained model is customized for fault localization. CodeAwareFL takes the code snippets and their corresponding propagation chains as inputs with test results as labels to conduct the training process. Finally, CodeAwareFL evaluates the suspiciousness of statements with interpretable machine learning techniques. In the experimental study, we choose 12 large-sized programs to conduct the comparison. The results show that CodeAwareFL achieves promising results (e.g., 32.43% faults are ranked within top 5), and is significantly better than 12 state-of-the-art baselines.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Pre-Training of an Artificial Neural Network for Software Fault Prediction
    Owhadi-Kareshk, Moein
    Sedaghat, Yasser
    Akbarzadeh-T, Mohammad-R
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2017, : 223 - 228
  • [42] Boundary-sensitive Pre-training for Temporal Localization in Videos
    Xu, Mengmeng
    Perez-Rua, Juan-Manuel
    Escorcia, Victor
    Martinez, Brais
    Zhu, Xiatian
    Zhang, Li
    Ghanem, Bernard
    Xiang, Tao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7200 - 7210
  • [43] LocVTP: Video-Text Pre-training for Temporal Localization
    Cao, Meng
    Yang, Tianyu
    Weng, Junwu
    Zhang, Can
    Wang, Jue
    Zou, Yuexian
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 38 - 56
  • [44] Breaking Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training
    Chen, Linqing
    Li, Junhui
    Gong, Zhengxian
    Chen, Boxing
    Luo, Weihua
    Zhang, Min
    Zhou, Guodong
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2851 - 2861
  • [45] Structure-Aware Pre-Training for Table-to-Text Generation
    Xing, Xinyu
    Wan, Xiaojun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2273 - 2278
  • [46] Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training
    Lei, Yibin
    Ding, Liang
    Cao, Yu
    Zan, Chantong
    Yates, Andrew
    Tao, Dacheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10932 - 10940
  • [47] Context-Aware Transformer Pre-Training for Answer Sentence Selection
    Di Liello, Luca
    Garg, Siddhant
    Moschitti, Alessandro
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 458 - 468
  • [48] Object-aware Video-language Pre-training for Retrieval
    Wang, Alex Jinpeng
    Ge, Yixiao
    Cai, Guanyu
    Yan, Rui
    Lin, Xudong
    Shan, Ying
    Qie, Xiaohu
    Shou, Mike Zheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3303 - 3312
  • [49] New Intent Discovery with Pre-training and Contrastive Learning
    Zhang, Yuwei
    Zhang, Haode
    Zhan, Li-Ming
    Wu, Xiao-Ming
    Lam, Albert Y. S.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 256 - 269
  • [50] Sequential attention deep learning architecture with unsupervised pre-training for interpretable and accurate building energy prediction with limited data
    Uwiragiye, Beatha
    Duhirwe, Patrick Nzivugira
    Seo, Hyeongjoon
    Yun, Geun Young
    JOURNAL OF ASIAN ARCHITECTURE AND BUILDING ENGINEERING, 2024, 23 (06) : 2012 - 2028