On Distribution Shift in Learning-based Bug Detectors

被引:0
|
作者
He, Jingxuan [1 ]
Beurer-Kellner, Luca [1 ]
Vechev, Martin [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite achieving high test accuracy (e.g., >90%), the resulting bug detectors are found to be surprisingly unusable in practice, i.e., <10% precision when used to scan real software repositories. In this work, we argue that this massive performance difference is caused by a distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. To address this key challenge, we propose to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. During these two phases, we leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our test set and the latest version of open source repositories. Our code, datasets, and models are publicly available at https://github.com/eth-sri/learning-real-bug-detector.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Countering Evasion Attacks for Smart Grid Reinforcement Learning-Based Detectors
    El-Toukhy, Ahmed T.
    Mahmoud, Mohamed M. E. A.
    Bondok, Atef H.
    Fouda, Mostafa M.
    Alsabaan, Maazen
    IEEE ACCESS, 2023, 11 : 97373 - 97390
  • [32] Investigating the impact of vulnerability datasets on deep learning-based vulnerability detectors
    Liu, Lili
    Li, Zhen
    Wen, Yu
    Chen, Penglong
    PEERJ COMPUTER SCIENCE, 2022, 8 : 1 - 22
  • [33] A Machine Learning-Based Framework for Water Quality Index Estimation in the Southern Bug River
    Masood, Adil
    Niazkar, Majid
    Zakwan, Mohammad
    Piraei, Reza
    WATER, 2023, 15 (20)
  • [34] Deep Learning-based Production and Test Bug Report Classification using Source Files
    Kim, Misoo
    Kim, Youngkyoung
    Lee, Eunseok
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), 2022, : 343 - 344
  • [35] Zonal Machine Learning-Based Protection for Distribution Systems
    Poudel, Binod P.
    Bidram, Ali
    Reno, Matthew J.
    Summers, Adam
    IEEE ACCESS, 2022, 10 : 66634 - 66645
  • [36] The Research of Q Learning-Based Estimation of Distribution Algorithm
    Hu Yugang
    2011 TENTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2011, : 6 - 9
  • [37] Deep Learning-Based Intelligent Reflecting Surface Phase Shift Control
    Kim, Hyunsoo
    Wu, Jiao
    Park, Yosub
    Kim, Seungnyun
    Shim, Byonghyo
    2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
  • [38] Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach
    Etter, Brian
    Hu, James Lee
    Ebrahimi, Mohammadreza
    Li, Weifeng
    Li, Xin
    Chen, Hsinchun
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1313 - 1321
  • [39] Evaluating and Improving Adversarial Robustness of Machine Learning-Based Network Intrusion Detectors
    Han, Dongqi
    Wang, Zhiliang
    Zhong, Ying
    Chen, Wenqi
    Yang, Jiahai
    Lu, Shuqiang
    Shi, Xingang
    Yin, Xia
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2021, 39 (08) : 2632 - 2647
  • [40] Reinforcement Learning-based Adversarial Attacks on Object Detectors using Reward Shaping
    Shi, Zhenbo
    Yang, Wei
    Xu, Zhenbo
    Yu, Zhidong
    Huang, Liusheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8424 - 8432