Sample-based software defect prediction with active and semi-supervised learning

被引:158
|
作者
Li, Ming [2 ]
Zhang, Hongyu [1 ]
Wu, Rongxin [1 ]
Zhou, Zhi-Hua [2 ]
机构
[1] Tsinghua Univ, MOE Key Lab Informat Syst Secur, Beijing 100084, Peoples R China
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
基金
美国国家科学基金会;
关键词
Software defect prediction; Sampling; Quality assurance; Machine learning; Active semi-supervised learning; STATIC CODE ATTRIBUTES; CLASSIFICATION; FRAMEWORK;
D O I
10.1007/s10515-011-0092-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice.
引用
收藏
页码:201 / 230
页数:30
相关论文
共 50 条
  • [21] Adversarial Learning for Cross-Project Semi-Supervised Defect Prediction
    Sun, Ying
    Jing, Xiao-Yuan
    Wu, Fei
    Li, Juanjuan
    Xing, Danlei
    Chen, Haowen
    Sun, Yanfei
    IEEE ACCESS, 2020, 8 : 32674 - 32687
  • [22] Analysis of active semi-supervised learning
    Berton, Lilian
    Mitsuishi, Felipe Baz
    Vega-Oliveros, Didier A.
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 1122 - 1129
  • [23] Adaptive Active Learning for Semi-supervised Learning
    Li Y.-C.
    Xiao F.
    Chen Z.
    Li B.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (12): : 3808 - 3822
  • [24] ASLDP: An Active Semi-supervised Learning method for Disk Failure Prediction
    Zhou, Yang
    Wang, Fang
    Feng, Dan
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [25] Semi-supervised learning for software quality estimation
    Seliya, N
    Khoshgoftaar, TM
    Zhong, S
    ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 183 - 190
  • [26] Combining Committee-Based Semi-Supervised Learning and Active Learning
    Mohamed Farouk Abdel Hady
    Friedhelm Schwenker
    Journal of Computer Science and Technology, 2010, 25 : 681 - 698
  • [27] Combining Committee-Based Semi-Supervised Learning and Active Learning
    Hady, Mohamed Farouk Abdel
    Schwenker, Friedhelm
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2010, 25 (04) : 681 - 698
  • [28] Combining Committee-Based Semi-Supervised Learning and Active Learning
    Mohamed Farouk Abdel Hady
    Friedhelm Schwenker
    JournalofComputerScience&Technology, 2010, 25 (04) : 681 - 698
  • [29] Semi-supervised batch active learning based on mutual information
    Ji, Xia
    Wang, Lingzhu
    Fang, Xiaohao
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [30] Interactive Cell Segmentation Based on Active and Semi-Supervised Learning
    Su, Hang
    Yin, Zhaozheng
    Huh, Seungil
    Kanade, Takeo
    Zhu, Jun
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2016, 35 (03) : 762 - 777