Comparative analysis of statistical and machine learning methods for predicting faulty modules

被引:53
|
作者
Malhotra, Ruchika [1 ]
机构
[1] Delhi Technol Univ, Dept Software Engn, Delhi 110042, India
关键词
Software quality; Static code metrics; Logistic regression; Machine learning; Receiver Operating Characteristic (ROC) curve; ORIENTED DESIGN METRICS; SOFTWARE QUALITY; CLASSIFICATION MODELS; NEURAL-NETWORKS;
D O I
10.1016/j.asoc.2014.03.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The demand for development of good quality software has seen rapid growth in the last few years. This is leading to increase in the use of the machine learning methods for analyzing and assessing public domain data sets. These methods can be used in developing models for estimating software quality attributes such as fault proneness, maintenance effort, testing effort. Software fault prediction in the early phases of software development can help and guide software practitioners to focus the available testing resources on the weaker areas during the software development. This paper analyses and compares the statistical and six machine learning methods for fault prediction. These methods (Decision Tree, Artificial Neural Network, Cascade Correlation Network, Support Vector Machine, Group Method of Data Handling Method, and Gene Expression Programming) are empirically validated to find the relationship between the static code metrics and the fault proneness of a module. In order to assess and compare the models predicted using the regression and the machine learning methods we used two publicly available data sets AR1 and AR6. We compared the predictive capability of the models using the Area Under the Curve (measured from the Receiver Operating Characteristic (ROC) analysis). The study confirms the predictive capability of the machine learning methods for software fault prediction. The results show that the Area Under the Curve of model predicted using the Decision Tree method is 0.8 and 0.9 (for AR1 and AR6 data sets, respectively) and is a better model than the model predicted using the logistic regression and other machine learning methods. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:286 / 297
页数:12
相关论文
共 50 条
  • [1] COMPARATIVE ANALYSIS OF MACHINE LEARNING AND STATISTICAL METHODS IN SOLAR ENERGY PREDICTION
    Pu, Zhiyong
    Xia, Pan
    Zhang, Lu
    Wang, Shuo
    Wang, Yun
    Min, Min
    [J]. Taiyangneng Xuebao/Acta Energiae Solaris Sinica, 2023, 44 (07): : 162 - 167
  • [2] Comparative study of statistical and machine learning methods for streetcar incident duration analysis
    Zhu, Siying
    [J]. INTERNATIONAL JOURNAL OF CRASHWORTHINESS, 2024, 29 (01) : 16 - 21
  • [3] Comparative Analysis of Machine Learning Methods for Predicting Energy Recovery from Waste
    Kulisz, Monika
    Kujawska, Justyna
    Cioch, Michal
    Cel, Wojciech
    Pizon, Jakub
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [4] Comparative analysis of regression and machine learning methods for predicting fault proneness models
    Singh, Yogesh
    Kaur, Arvinder
    Malhotra, Ruchika
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2009, 35 (2-4) : 183 - 193
  • [5] Machine learning and statistical methods for predicting mortality in heart failure
    Mpanya, Dineo
    Celik, Turgay
    Klug, Eric
    Ntsinjana, Hopewell
    [J]. HEART FAILURE REVIEWS, 2021, 26 (03) : 545 - 552
  • [6] Machine learning and statistical methods for predicting mortality in heart failure
    Dineo Mpanya
    Turgay Celik
    Eric Klug
    Hopewell Ntsinjana
    [J]. Heart Failure Reviews, 2021, 26 : 545 - 552
  • [7] Comparative analysis of machine learning algorithms and statistical models for predicting crown width of Larix olgensis
    Qiu, Siyu
    Liang, Ruiting
    Wang, Yifu
    Luo, Mi
    Sun, Yujun
    [J]. EARTH SCIENCE INFORMATICS, 2022, 15 (04) : 2415 - 2429
  • [8] Comparative analysis of machine learning algorithms and statistical models for predicting crown width of Larix olgensis
    Siyu Qiu
    Ruiting Liang
    Yifu Wang
    Mi Luo
    Yujun Sun
    [J]. Earth Science Informatics, 2022, 15 : 2415 - 2429
  • [9] Analyzing the Effectiveness of Machine Learning Algorithms for Determining Faulty Classes: A Comparative Analysis
    Singh, Prabhpahul
    Malhotra, Ruchika
    Bansal, Siddartha
    [J]. 2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 325 - 330
  • [10] Comparative Analysis of Machine Learning Methods for Predicting Robotized Incremental Metal Sheet Forming Force
    Ostasevicius, Vytautas
    Paleviciute, Ieva
    Paulauskaite-Taraseviciene, Agne
    Jurenas, Vytautas
    Eidukynas, Darius
    Kizauskiene, Laura
    [J]. SENSORS, 2022, 22 (01)