GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation

被引:134
|
作者
Oliveira, Adriano L. I. [1 ]
Braga, Petronio L. [1 ]
Lima, Ricardo M. F. [1 ]
Cornelio, Marcio L. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-50732970 Recife, PE, Brazil
关键词
Software effort estimation; Genetic algorithms; Feature selection; Support vector regression; Regression; EFFORT PREDICTION;
D O I
10.1016/j.infsof.2010.05.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: In software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy. Objective: This paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates. Method: Simulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models. Results: In all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis. Conclusions: The combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:1155 / 1166
页数:12
相关论文
共 50 条
  • [31] GA-Based Feature Selection Method for Imbalanced Data with Application in Radio Signal Recognition
    Limin Du
    Yang Xu
    Jun Liu
    Fangli Ma
    [J]. International Journal of Computational Intelligence Systems, 2015, 8 : 39 - 47
  • [32] GA-Based Feature Selection Method for Imbalanced Data with Application in Radio Signal Recognition
    Du, Limin
    Xu, Yang
    Liu, Jun
    Ma, Fangli
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 : 39 - 47
  • [33] A GA-Based Approach to ICA Feature Selection: An Efficient Method to Classify Microarray Datasets
    Liu, Kun-Hong
    Zhang, Jun
    Li, Bo
    Du, Ji-Xiang
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 2, PROCEEDINGS, 2009, 5552 : 432 - +
  • [34] Combined optimization of feature selection and algorithm parameters in machine learning of language
    Daelemans, W
    Hoste, V
    De Meulder, F
    Naudts, B
    [J]. MACHINE LEARNING: ECML 2003, 2003, 2837 : 84 - 95
  • [35] A Two-Stage GA-Based sEMG Feature Selection Method for User-Independent Continuous Estimation of Elbow Angles
    Li, He
    Guo, Shuxiang
    Bu, Dongdong
    Wang, Hanze
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [36] Review and Empirical Analysis of Machine Learning-Based Software Effort Estimation
    Rahman, Mizanur
    Sarwar, Hasan
    Kader, MD. Abdul
    Goncalves, Teresa
    Tin, Ting Tin
    [J]. IEEE ACCESS, 2024, 12 : 85661 - 85680
  • [37] Improving Software Regression Testing Using a Machine Learning-Based Method for Test Type Selection
    Al-Sabbagh, Khaled Walid
    Staron, Miroslaw
    Hebig, Regina
    [J]. PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2022, 2022, 13709 : 480 - 496
  • [38] A Machine Learning-Based Wrapper Method for Feature Selection
    Patel, Damodar
    Saxena, Amit
    Wang, John
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2024, 20 (01)
  • [39] Incorporating statistical and machine learning techniques into the optimization of correction factors for software development effort estimation
    Nhung, Ho Le Thi Kim
    Van Hai, Vo
    Silhavy, Petr
    Prokopova, Zdenka
    Silhavy, Radek
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2023,
  • [40] Incorporating statistical and machine learning techniques into the optimization of correction factors for software development effort estimation
    Ho Le Thi Kim Nhung
    Vo Van Hai
    Silhavy, Petr
    Prokopova, Zdenka
    Silhavy, Radek
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024, 36 (05)