Comparative analysis of statistical and machine learning methods for predicting faulty modules

被引:53
|
作者
Malhotra, Ruchika [1 ]
机构
[1] Delhi Technol Univ, Dept Software Engn, Delhi 110042, India
关键词
Software quality; Static code metrics; Logistic regression; Machine learning; Receiver Operating Characteristic (ROC) curve; ORIENTED DESIGN METRICS; SOFTWARE QUALITY; CLASSIFICATION MODELS; NEURAL-NETWORKS;
D O I
10.1016/j.asoc.2014.03.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The demand for development of good quality software has seen rapid growth in the last few years. This is leading to increase in the use of the machine learning methods for analyzing and assessing public domain data sets. These methods can be used in developing models for estimating software quality attributes such as fault proneness, maintenance effort, testing effort. Software fault prediction in the early phases of software development can help and guide software practitioners to focus the available testing resources on the weaker areas during the software development. This paper analyses and compares the statistical and six machine learning methods for fault prediction. These methods (Decision Tree, Artificial Neural Network, Cascade Correlation Network, Support Vector Machine, Group Method of Data Handling Method, and Gene Expression Programming) are empirically validated to find the relationship between the static code metrics and the fault proneness of a module. In order to assess and compare the models predicted using the regression and the machine learning methods we used two publicly available data sets AR1 and AR6. We compared the predictive capability of the models using the Area Under the Curve (measured from the Receiver Operating Characteristic (ROC) analysis). The study confirms the predictive capability of the machine learning methods for software fault prediction. The results show that the Area Under the Curve of model predicted using the Decision Tree method is 0.8 and 0.9 (for AR1 and AR6 data sets, respectively) and is a better model than the model predicted using the logistic regression and other machine learning methods. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:286 / 297
页数:12
相关论文
共 50 条
  • [31] Comparative Analysis of Machine Learning Methods for Prediction of Heart Diseases
    I. V. Stepanyan
    Ch. A. Alimbayev
    M. O. Savkin
    D. Lyu
    M. Zidun
    [J]. Journal of Machinery Manufacture and Reliability, 2022, 51 : 789 - 799
  • [32] A Comparative Analysis of Machine Learning Models for Predicting Loess Collapse Potential
    Sahand Motameni
    Fateme Rostami
    Sara Farzai
    Abbas Soroush
    [J]. Geotechnical and Geological Engineering, 2024, 42 : 881 - 894
  • [33] Comparative analysis of machine learning algorithms for predicting Dubai property prices
    Balila, Abdulsalam Elnaeem
    Bin Shabri, Ani
    [J]. FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2024, 10
  • [34] Comparative analysis of machine learning methods for active flow control
    Pino, Fabio
    Schena, Lorenzo
    Rabault, Jean
    Mendez, Miguel A.
    [J]. JOURNAL OF FLUID MECHANICS, 2023, 958
  • [35] Comparative Analysis of Machine Learning Methods for Prediction of Heart Diseases
    Stepanyan, I. V.
    Alimbayev, Ch. A.
    Savkin, M. O.
    Lyu, D.
    Zidun, M.
    [J]. JOURNAL OF MACHINERY MANUFACTURE AND RELIABILITY, 2022, 51 (08) : 789 - 799
  • [36] Comparative Analysis for Slope Stability by Using Machine Learning Methods
    Nanehkaran, Yaser A.
    Licai, Zhu
    Chengyong, Jin
    Chen, Junde
    Anwar, Sheraz
    Azarafza, Mohammad
    Derakhshani, Reza
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [37] Statistical and machine learning methods for spatially resolved transcriptomics data analysis
    Zeng, Zexian
    Li, Yawei
    Li, Yiming
    Luo, Yuan
    [J]. GENOME BIOLOGY, 2022, 23 (01)
  • [38] Statistical and machine learning methods for spatially resolved transcriptomics data analysis
    Zexian Zeng
    Yawei Li
    Yiming Li
    Yuan Luo
    [J]. Genome Biology, 23
  • [39] Analysis and prediction of TetR allostery with machine learning methods and a statistical model
    Liu, Zhuang
    Leander, Megan
    Raman, Srivatsan
    Cui, Qiang
    [J]. BIOPHYSICAL JOURNAL, 2022, 121 (03) : 286A - 287A
  • [40] Predicting Traffic Flow on Faulty Traffic Detectors Using Machine Learning Techniques
    Bagabaldo, Alben Rome B.
    Gonzalez, Marta C.
    [J]. INTERNATIONAL CONFERENCE ON TRANSPORTATION AND DEVELOPMENT 2022: APPLICATION OF EMERGING TECHNOLOGIES, 2022, : 202 - 212