GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation

被引:134
|
作者
Oliveira, Adriano L. I. [1 ]
Braga, Petronio L. [1 ]
Lima, Ricardo M. F. [1 ]
Cornelio, Marcio L. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-50732970 Recife, PE, Brazil
关键词
Software effort estimation; Genetic algorithms; Feature selection; Support vector regression; Regression; EFFORT PREDICTION;
D O I
10.1016/j.infsof.2010.05.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: In software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy. Objective: This paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates. Method: Simulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models. Results: In all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis. Conclusions: The combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:1155 / 1166
页数:12
相关论文
共 50 条
  • [11] A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer
    Ahmad, Fadzil
    Isa, Nor Ashidi Mat
    Hussain, Zakaria
    Osman, Muhammad Khusairi
    Sulaiman, Siti Noraini
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2015, 18 (04) : 861 - 870
  • [12] A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer
    Fadzil Ahmad
    Nor Ashidi Mat Isa
    Zakaria Hussain
    Muhammad Khusairi Osman
    Siti Noraini Sulaiman
    [J]. Pattern Analysis and Applications, 2015, 18 : 861 - 870
  • [13] Towards effective feature selection in estimating software effort using machine learning
    Jadhav, Akshay
    Kumar Shandilya, Shishir
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024, 36 (05)
  • [14] Simultaneous Feature with Support Vector Selection and Parameters Optimization Using GA-Based SVM Solve the Binary Classification
    Fei, Ye
    Min, Han
    [J]. 2016 FIRST IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND THE INTERNET (ICCCI 2016), 2016, : 426 - 433
  • [15] Deep Learning Model with GA-based Visual Feature Selection and Context Integration
    Mandal, Ranju
    Azam, Basim
    Verma, Brijesh
    Zhang, Mengjie
    [J]. 2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 288 - 295
  • [16] A GA-BASED FEATURE SELECTION AND ENSEMBLE LEARNING FOR HIGH-DIMENSIONAL DATASETS
    Xia, Pei-Yong
    Ding, Xiang-Qian
    Jiang, Bai-Ning
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 7 - +
  • [17] Machine Learning-based Software Effort Estimation : An Analysis
    Polkowski, Zdzislaw
    Vora, Jayneel
    Tanwar, Sudeep
    Tyagi, Sudhanshu
    Singh, Pradeep Kumar
    Singh, Yashwant
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2019), 2019,
  • [18] An Extreme Learning Machine based Approach for Software Effort Estimation
    Shukla, Suyash
    Kumar, Sandeep
    [J]. ENASE: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, 2021, : 47 - 57
  • [19] GA-based feature selection method for oversized data analysis in digital economy
    Lv, Yao
    Liu, Peng
    Wang, Juan
    Zhang, Yao
    Slowik, Adam
    Lv, Jianhui
    [J]. EXPERT SYSTEMS, 2024, 41 (01)
  • [20] Empirical Evaluation of Test Effort Efficiency of Software GA-based Regression Test Case Prioritization Strategy
    Musa, Samaila
    Sultan, Abu Bakar Md
    Abd Ghani, Abdul Azim
    Baharom, Salmi
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND TECHNOLOGY (ICAST'18), 2018, 2016