On Distribution Shift in Learning-based Bug Detectors

被引:0
|
作者
He, Jingxuan [1 ]
Beurer-Kellner, Luca [1 ]
Vechev, Martin [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite achieving high test accuracy (e.g., >90%), the resulting bug detectors are found to be surprisingly unusable in practice, i.e., <10% precision when used to scan real software repositories. In this work, we argue that this massive performance difference is caused by a distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. To address this key challenge, we propose to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. During these two phases, we leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our test set and the latest version of open source repositories. Our code, datasets, and models are publicly available at https://github.com/eth-sri/learning-real-bug-detector.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] How About Bug-Triggering Paths? - Understanding and Characterizing Learning-Based Vulnerability Detectors
    Cheng, Xiao
    Nie, Xu
    Li, Ningke
    Wang, Haoyu
    Zheng, Zheng
    Sui, Yulei
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (02) : 542 - 558
  • [2] Bug characterization in machine learning-based systems
    Mohammad Mehdi Morovati
    Amin Nikanjam
    Florian Tambon
    Foutse Khomh
    Zhen Ming (Jack) Jiang
    Empirical Software Engineering, 2024, 29
  • [3] Bug characterization in machine learning-based systems
    Morovati, Mohammad Mehdi
    Nikanjam, Amin
    Tambon, Florian
    Khomh, Foutse
    Jiang, Zhen Ming
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (01)
  • [4] Deep learning-based software bug classification
    Meher, Jyoti Prakash
    Biswas, Sourav
    Mall, Rajib
    INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 166
  • [5] Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control
    Kang, Katie
    Gradu, Paula
    Choi, Jason
    Janner, Michael
    Tomlin, Claire
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10708 - 10733
  • [6] On the Deterioration of Learning-Based Malware Detectors for Android
    Fu, Xiaoqin
    Cai, Haipeng
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2019), 2019, : 272 - 273
  • [7] Learning Realistic Mutations: Bug Creation for Neural Bug Detectors
    Richter, Cedric
    Wehrheim, Heike
    2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2022), 2022, : 162 - 173
  • [8] Investigating the Generalizability of Deep Learning-based Clone Detectors
    Choi, Eunjong
    Fuke, Norihiro
    Fujiwara, Yuji
    Yoshida, Norihiro
    Inoue, Katsuro
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 181 - 185
  • [9] MPass: Bypassing Learning-based Static Malware Detectors
    Wang, Jialai
    Qu, Wenjie
    Rong, Yi
    Qiu, Han
    Li, Qi
    Li, Zongpeng
    Zhang, Chao
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [10] FDD: a deep learning-based steel defect detectors
    Akhyar, Fityanul
    Liu, Ying
    Hsu, Chao-Yung
    Shih, Timothy K.
    Lin, Chih-Yang
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 126 (3-4): : 1093 - 1107