A systematic review of unsupervised learning techniques for software defect prediction

被引:110
|
作者
Li, Ning [1 ,4 ]
Shepperd, Martin [2 ]
Guo, Yuchen [3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[2] Brunel Univ London, Uxbridge UB8 3PH, Middx, England
[3] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China
[4] Minist Ind & Informat Technol, Key Lab Big Data Storage & Management, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Software defect prediction; Machine learning; Systematic review; Meta-analysis; PERFORMANCE;
D O I
10.1016/j.infsof.2020.106287
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background: Unsupervised machine learners have been increasingly applied to software defect prediction. It is an approach that may be valuable for software practitioners because it reduces the need for labeled training data. Objective: Investigate the use and performance of unsupervised learning techniques in software defect prediction. Method: We conducted a systematic literature review that identified 49 studies containing 2456 individual experimental results, which satisfied our inclusion criteria published between January 2000 and March 2018. In order to compare prediction performance across these studies in a consistent way, we (re-)computed the confusion matrices and employed the Matthews Correlation Coefficient (MCC) as our main performance measure. Results: Our meta-analysis shows that unsupervised models are comparable with supervised models for both within-project and cross-project prediction. Among the 14 families of unsupervised model, Fuzzy CMeans (FCM) and Fuzzy SOMs (FSOMs) perform best. In addition, where we were able to check, we found that almost 11% (262/2456) of published results (contained in 16 papers) were internally inconsistent and a further 33% (823/2456) provided insufficient details for us to check. Conclusion: Although many factors impact the performance of a classifier, e.g., dataset characteristics, broadly speaking, unsupervised classifiers do not seem to perform worse than the supervised classifiers in our review. However, we note a worrying prevalence of (i) demonstrably erroneous experimental results, (ii) undemanding benchmarks and (iii) incomplete reporting. We therefore encourage researchers to be comprehensive in their reporting.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Software Defect Prediction Using Supervised Machine Learning Techniques: A Systematic Literature Review
    Matloob, Faseeha
    Aftab, Shabib
    Ahmad, Munir
    Khan, Muhammad Adnan
    Fatima, Areej
    Iqbal, Muhammad
    Alruwaili, Wesam Mohsen
    Elmitwally, Nouh Sabri
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 29 (02): : 403 - 421
  • [2] A Systematic Review of Ensemble Techniques for Software Defect and Change Prediction
    Khanna, Megha
    [J]. E-INFORMATICA SOFTWARE ENGINEERING JOURNAL, 2022, 16 (01) : 1 - 41
  • [3] Software defect prediction using hybrid techniques: a systematic literature review
    Malhotra, Ruchika
    Chawla, Sonali
    Sharma, Anjali
    [J]. SOFT COMPUTING, 2023, 27 (12) : 8255 - 8288
  • [4] Software defect prediction using hybrid techniques: a systematic literature review
    Ruchika Malhotra
    Sonali Chawla
    Anjali Sharma
    [J]. Soft Computing, 2023, 27 : 8255 - 8288
  • [5] A Systematic Review on Software Defect Prediction
    Singh, Pradeep Kumar
    Agarwal, Dishti
    Gupta, Aakriti
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1793 - 1797
  • [6] A systematic review of machine learning techniques for software fault prediction
    Malhotra, Ruchika
    [J]. APPLIED SOFT COMPUTING, 2015, 27 : 504 - 518
  • [7] Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review
    Matloob, Faseeha
    Ghazal, Taher M.
    Taleb, Nasser
    Aftab, Shabib
    Ahmad, Munir
    Khan, Muhammad Adnan
    Abbas, Sagheer
    Soomro, Tariq Rahim
    [J]. IEEE ACCESS, 2021, 9 : 98754 - 98771
  • [8] Software Risk Prediction: Systematic Literature Review on Machine Learning Techniques
    Mahmud, Mahmudul Hoque
    Nayan, Md Tanzirul Haque
    Ashir, Dewan Md Nur Anjum
    Kabir, Md Alamgir
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [9] A systematic literature review of machine learning techniques for software maintainability prediction
    Alsolai, Hadeel
    Roper, Marc
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 119
  • [10] Unsupervised methods for Software Defect Prediction
    Ha, Duy-An
    Chen, Ting-Hsuan
    Yuan, Shyan-Ming
    [J]. SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 49 - 55