Sample-based software defect prediction with active and semi-supervised learning

被引:158
|
作者
Li, Ming [2 ]
Zhang, Hongyu [1 ]
Wu, Rongxin [1 ]
Zhou, Zhi-Hua [2 ]
机构
[1] Tsinghua Univ, MOE Key Lab Informat Syst Secur, Beijing 100084, Peoples R China
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
基金
美国国家科学基金会;
关键词
Software defect prediction; Sampling; Quality assurance; Machine learning; Active semi-supervised learning; STATIC CODE ATTRIBUTES; CLASSIFICATION; FRAMEWORK;
D O I
10.1007/s10515-011-0092-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice.
引用
收藏
页码:201 / 230
页数:30
相关论文
共 50 条
  • [41] Learning Safe Prediction for Semi-Supervised Regression
    Li, Yu-Feng
    Zha, Han-Wen
    Zhou, Zhi-Hua
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2217 - 2223
  • [42] ACTIVE SEMI-SUPERVISED LEARNING FOR DIFFUSIONS ON GRAPHS
    Das, Bishwadeep
    Isufi, Elvin
    Leus, Geert
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 9075 - 9079
  • [43] Semi-supervised Learning for Structured Output Prediction
    Levatić J.
    Informatica (Slovenia), 2022, 46 (04): : 583 - 584
  • [44] Effectiveness of semi-supervised learning in bankruptcy prediction
    Karlos, Stamatis
    Fazakis, Nikos
    Kotsiantis, Sotiris
    Sgarbas, Kyrgiakos
    2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA), 2016,
  • [45] Geostatistical semi-supervised learning for spatial prediction
    Fouedjio, Francky
    Talebi, Hassan
    ARTIFICIAL INTELLIGENCE IN GEOSCIENCES, 2022, 3 : 162 - 178
  • [46] Learning sample-aware threshold for semi-supervised learning
    Wei, Qi
    Feng, Lei
    Sun, Haoliang
    Wang, Ren
    He, Rundong
    Yin, Yilong
    MACHINE LEARNING, 2024, 113 (08) : 5423 - 5445
  • [47] Link prediction in graph construction for supervised and semi-supervised learning
    Berton, Lilian
    Valverde-Rebaza, Jorge
    Lopes, Alneu de Andrade
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [48] Hebbian semi-supervised learning in a sample efficiency setting
    Lagani, Gabriele
    Falchi, Fabrizio
    Gennaro, Claudio
    Amato, Giuseppe
    NEURAL NETWORKS, 2021, 143 : 719 - 731
  • [49] Semi-supervised Active Learning for Semi-supervised Models: Exploit Adversarial Examples with Graph-based Virtual Labels
    Guo, Jiannan
    Shi, Haochen
    Kang, Yangyang
    Kuang, Kun
    Tang, Siliang
    Jiang, Zhuoren
    Sun, Changlong
    Wu, Fei
    Zhuang, Yueting
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2876 - 2885
  • [50] Selecting Informative Universum Sample for Semi-Supervised Learning
    Chen, Shuo
    Zhang, Changshui
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1016 - 1021