Sample-based software defect prediction with active and semi-supervised learning

被引:158
|
作者
Li, Ming [2 ]
Zhang, Hongyu [1 ]
Wu, Rongxin [1 ]
Zhou, Zhi-Hua [2 ]
机构
[1] Tsinghua Univ, MOE Key Lab Informat Syst Secur, Beijing 100084, Peoples R China
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
基金
美国国家科学基金会;
关键词
Software defect prediction; Sampling; Quality assurance; Machine learning; Active semi-supervised learning; STATIC CODE ATTRIBUTES; CLASSIFICATION; FRAMEWORK;
D O I
10.1007/s10515-011-0092-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice.
引用
收藏
页码:201 / 230
页数:30
相关论文
共 50 条
  • [1] Sample-based software defect prediction with active and semi-supervised learning
    Ming Li
    Hongyu Zhang
    Rongxin Wu
    Zhi-Hua Zhou
    Automated Software Engineering, 2012, 19 : 201 - 230
  • [2] Label propagation based semi-supervised learning for software defect prediction
    Zhang, Zhi-Wu
    Jing, Xiao-Yuan
    Wang, Tie-Jian
    AUTOMATED SOFTWARE ENGINEERING, 2017, 24 (01) : 47 - 69
  • [3] Label propagation based semi-supervised learning for software defect prediction
    Zhi-Wu Zhang
    Xiao-Yuan Jing
    Tie-Jian Wang
    Automated Software Engineering, 2017, 24 : 47 - 69
  • [4] An improved semi-supervised learning method for software defect prediction
    Ma, Ying
    Pan, Weiwei
    Zhu, Shunzhi
    Yin, Huayi
    Luo, Jian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 27 (05) : 2473 - 2480
  • [5] A Semi-Supervised Approach to Software Defect Prediction
    Lu, Huihua
    Cukic, Bojan
    Culp, Mark
    2014 IEEE 38TH ANNUAL INTERNATIONAL COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2014, : 416 - 425
  • [6] Software Defect Prediction Using Semi-supervised Learning with Dimension Reduction
    Lu, Huihua
    Cukic, Bojan
    Culp, Mark
    2012 PROCEEDINGS OF THE 27TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2012, : 314 - 317
  • [7] An Integrated Semi-supervised Software Defect Prediction Model
    Meng, Fanqi
    Cheng, Wenying
    Wang, Jingdong
    JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (06): : 1307 - 1317
  • [8] Software Defect Prediction Using Semi-supervised Learning with Change Burst Information
    He, Qing
    Shen, Beijun
    Chen, Yuting
    PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS, VOL 1, 2016, : 113 - 122
  • [9] Protein Function Prediction Based on Active Semi-supervised Learning
    WANG Xuesong
    CHENG Yuhu
    LI Lijing
    ChineseJournalofElectronics, 2016, 25 (04) : 595 - 600
  • [10] Protein Function Prediction Based on Active Semi-supervised Learning
    Wang Xuesong
    Cheng Yuhu
    Li Lijing
    CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (04) : 595 - 600