A systematic review of unsupervised learning techniques for software defect prediction

被引：110

作者：

Li, Ning ^{[1
,4
]}

Shepperd, Martin ^{[2
]}

Guo, Yuchen ^{[3
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China

[2] Brunel Univ London, Uxbridge UB8 3PH, Middx, England

[3] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China

[4] Minist Ind & Informat Technol, Key Lab Big Data Storage & Management, Xian 710072, Peoples R China

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2020年 / 122卷

基金：

中国国家自然科学基金;

关键词：

Unsupervised learning; Software defect prediction; Machine learning; Systematic review; Meta-analysis; PERFORMANCE;

D O I：

10.1016/j.infsof.2020.106287

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Background: Unsupervised machine learners have been increasingly applied to software defect prediction. It is an approach that may be valuable for software practitioners because it reduces the need for labeled training data. Objective: Investigate the use and performance of unsupervised learning techniques in software defect prediction. Method: We conducted a systematic literature review that identified 49 studies containing 2456 individual experimental results, which satisfied our inclusion criteria published between January 2000 and March 2018. In order to compare prediction performance across these studies in a consistent way, we (re-)computed the confusion matrices and employed the Matthews Correlation Coefficient (MCC) as our main performance measure. Results: Our meta-analysis shows that unsupervised models are comparable with supervised models for both within-project and cross-project prediction. Among the 14 families of unsupervised model, Fuzzy CMeans (FCM) and Fuzzy SOMs (FSOMs) perform best. In addition, where we were able to check, we found that almost 11% (262/2456) of published results (contained in 16 papers) were internally inconsistent and a further 33% (823/2456) provided insufficient details for us to check. Conclusion: Although many factors impact the performance of a classifier, e.g., dataset characteristics, broadly speaking, unsupervised classifiers do not seem to perform worse than the supervised classifiers in our review. However, we note a worrying prevalence of (i) demonstrably erroneous experimental results, (ii) undemanding benchmarks and (iii) incomplete reporting. We therefore encourage researchers to be comprehensive in their reporting.

引用

页数：15

共 50 条

[1] Software Defect Prediction Using Supervised Machine Learning Techniques: A Systematic Literature Review
Matloob, Faseeha
Aftab, Shabib
Ahmad, Munir
Khan, Muhammad Adnan
Fatima, Areej
Iqbal, Muhammad
Alruwaili, Wesam Mohsen
Elmitwally, Nouh Sabri
[J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 29 (02): : 403 - 421
[2] A Systematic Review of Ensemble Techniques for Software Defect and Change Prediction
Khanna, Megha
[J]. E-INFORMATICA SOFTWARE ENGINEERING JOURNAL, 2022, 16 (01) : 1 - 41
[3] Software defect prediction using hybrid techniques: a systematic literature review
Malhotra, Ruchika
Chawla, Sonali
Sharma, Anjali
[J]. SOFT COMPUTING, 2023, 27 (12) : 8255 - 8288
[4] Software defect prediction using hybrid techniques: a systematic literature review
Ruchika Malhotra
Sonali Chawla
Anjali Sharma
[J]. Soft Computing, 2023, 27 : 8255 - 8288
[5] A Systematic Review on Software Defect Prediction
Singh, Pradeep Kumar
Agarwal, Dishti
Gupta, Aakriti
[J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1793 - 1797
[6] A systematic review of machine learning techniques for software fault prediction
Malhotra, Ruchika
[J]. APPLIED SOFT COMPUTING, 2015, 27 : 504 - 518
[7] Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review
Matloob, Faseeha
Ghazal, Taher M.
Taleb, Nasser
Aftab, Shabib
Ahmad, Munir
Khan, Muhammad Adnan
Abbas, Sagheer
Soomro, Tariq Rahim
[J]. IEEE ACCESS, 2021, 9 : 98754 - 98771
[8] Software Risk Prediction: Systematic Literature Review on Machine Learning Techniques
Mahmud, Mahmudul Hoque
Nayan, Md Tanzirul Haque
Ashir, Dewan Md Nur Anjum
Kabir, Md Alamgir
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (22):
[9] A systematic literature review of machine learning techniques for software maintainability prediction
Alsolai, Hadeel
Roper, Marc
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 119
[10] Unsupervised methods for Software Defect Prediction
Ha, Duy-An
Chen, Ting-Hsuan
Yuan, Shyan-Ming
[J]. SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 49 - 55

← 1 2 3 4 5 →