Data Mining Techniques for Software Effort Estimation: A Comparative Study

被引:119
|
作者
Dejaeger, Karel [1 ]
Verbeke, Wouter [1 ]
Martens, David [2 ]
Baesens, Bart [1 ,3 ]
机构
[1] Katholieke Univ Leuven, Dept Decis Sci & Informat Management, B-3000 Louvain, Belgium
[2] Univ Antwerp, Fac Appl Econ, B-2000 Antwerp, Belgium
[3] Univ Southampton, Sch Management, Highfield Southampton SO17 1BJ, Hants, England
关键词
Data mining; software effort estimation; regression; COST ESTIMATION; FEEDFORWARD NETWORKS; EMPIRICAL VALIDATION; MUTUAL INFORMATION; EFFORT PREDICTION; FEATURE-SELECTION; NEURAL-NETWORKS; MODELS; CLASSIFICATION; ANALOGY;
D O I
10.1109/TSE.2011.55
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion as to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration, including techniques inducing tree/rule-based models like M5 and CART, linear models such as various types of linear regression, nonlinear models (MARS, multilayered perceptron neural networks, radial basis function networks, and least squares support vector machines), and estimation techniques that do not explicitly induce a model (e.g., a case-based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained.
引用
收藏
页码:375 / 397
页数:23
相关论文
共 50 条
  • [31] Mining educational data to predict students performanceA comparative study of data mining techniques
    Khaledun Nahar
    Boishakhe Islam Shova
    Tahmina Ria
    Humayara Binte Rashid
    A. H. M. Saiful Islam
    Education and Information Technologies, 2021, 26 : 6051 - 6067
  • [32] Application of Function Points and Data Mining Techniques for Software Estimation - A Combined Approach
    Pospieszny, Przemyslaw
    Czarnacka-Chrobot, Beata
    Kobylinski, Andrzej
    SOFTWARE MEASUREMENT (IWSM-MENSURA 2015), 2015, 230 : 96 - 113
  • [33] Mining educational data to predict students performance A comparative study of data mining techniques
    Nahar, Khaledun
    Shova, Boishakhe Islam
    Ria, Tahmina
    Rashid, Humayara Binte
    Islam, A. H. M. Saiful
    EDUCATION AND INFORMATION TECHNOLOGIES, 2021, 26 (05) : 6051 - 6067
  • [34] A comparative study of two fuzzy logic models for software development effort estimation
    Garcia-Diaz, Noel
    Lopez-Martin, Cuauhtemoc
    Chavoya, Arturo
    3RD IBEROAMERICAN CONFERENCE ON ELECTRONICS ENGINEERING AND COMPUTER SCIENCE, CIIECC 2013, 2013, 7 : 305 - 314
  • [35] Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data
    Kocaguneli, Ekrem
    Menzies, Tim
    Keung, Jacky
    Cok, David
    Madachy, Ray
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2013, 39 (08) : 1040 - 1053
  • [36] Ensemble missing data techniques for software effort prediction
    Twala, Bhekisipho
    Cartwright, Michelle
    INTELLIGENT DATA ANALYSIS, 2010, 14 (03) : 299 - 331
  • [37] Validation of Existing Software Effort Estimation Techniques in Context with Mobile Software Applications
    Mamta Pandey
    Ratnesh Litoriya
    Prateek Pandey
    Wireless Personal Communications, 2020, 110 : 1659 - 1677
  • [38] Validation of Existing Software Effort Estimation Techniques in Context with Mobile Software Applications
    Pandey, Mamta
    Litoriya, Ratnesh
    Pandey, Prateek
    WIRELESS PERSONAL COMMUNICATIONS, 2020, 110 (04) : 1659 - 1677
  • [39] A Comparison Study Between Soft Computing and Statistical Regression Techniques for Software Effort Estimation
    Abdellatif, Tamer Mohamed
    2018 IEEE CANADIAN CONFERENCE ON ELECTRICAL & COMPUTER ENGINEERING (CCECE), 2018,
  • [40] Data Mining Techniques for Early Diagnosis of Diabetes: A Comparative Study
    Chaves, Luis
    Marques, Goncalo
    APPLIED SCIENCES-BASEL, 2021, 11 (05): : 1 - 12