GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation

被引:134
|
作者
Oliveira, Adriano L. I. [1 ]
Braga, Petronio L. [1 ]
Lima, Ricardo M. F. [1 ]
Cornelio, Marcio L. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-50732970 Recife, PE, Brazil
关键词
Software effort estimation; Genetic algorithms; Feature selection; Support vector regression; Regression; EFFORT PREDICTION;
D O I
10.1016/j.infsof.2010.05.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: In software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy. Objective: This paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates. Method: Simulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models. Results: In all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis. Conclusions: The combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:1155 / 1166
页数:12
相关论文
共 50 条
  • [1] A GA-based Feature Selection and Parameters Optimization for Support Vector Regression Applied to Software Effort Estimation
    Braga, Petronio L.
    Oliveira, Adriano L. I.
    Meira, Silvio R. L.
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1788 - +
  • [2] A GA-based feature selection and parameter optimization for support tucker machine
    Zeng, Dewei
    Wang, Shuqiang
    Shen, Yanyan
    Shi, Changhong
    [J]. 8TH INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION TECHNOLOGY, 2017, 111 : 17 - 23
  • [3] A GA-based feature selection and parameters optimization for support vector machines
    Huang, Cheng-Lung
    Wang, Chieh-Jen
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (02) : 231 - 240
  • [4] Extreme Learning Machine Applied to Software Development Effort Estimation
    Pereira de Carvalho, Halcyon Davys
    Fagundes, Roberta
    Santos, Wylliams
    [J]. IEEE ACCESS, 2021, 9 : 92676 - 92687
  • [5] A HYBRID METHOD FOR INTRUSION DETECTION WITH GA-BASED FEATURE SELECTION
    Chen, Zh-Xian
    Huang, Hao
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2011, 17 (02): : 175 - 186
  • [6] Regression Analysis Based Software Effort Estimation Method
    Yucalar, Fatih
    Kilinc, Deniz
    Borandag, Emin
    Ozcift, Akin
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2016, 26 (05) : 807 - 826
  • [7] A GA-based feature subset selection and parameter optimization of support vector machine for content based image retrieval
    Seo, Kwang-Kyu
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2007, 4632 : 594 - 604
  • [8] A GA-based feature selection and parameter optimization for linear support higher-order tensor machine
    Guo, Tengjiao
    Han, Le
    He, Lifang
    Yang, Xiaowei
    [J]. NEUROCOMPUTING, 2014, 144 : 408 - 416
  • [9] Unified Feature Selection and Hyperparameter Bayesian Optimization for Machine Learning based Regression
    Sandru, Elena-Diana
    David, Emilian
    [J]. 2019 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS 2019), 2019,
  • [10] A GA-based feature subset selection and parameter optimization of support vector machine for content-based image retrieval
    Seo, Kwang-Kyu
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2007, 4632 : 594 - 604