GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation

被引：134

作者：

Oliveira, Adriano L. I. ^{[1
]}

Braga, Petronio L. ^{[1
]}

Lima, Ricardo M. F. ^{[1
]}

Cornelio, Marcio L. ^{[1
]}

机构：

[1] Univ Fed Pernambuco, Ctr Informat, BR-50732970 Recife, PE, Brazil

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2010年 / 52卷 / 11期

关键词：

Software effort estimation; Genetic algorithms; Feature selection; Support vector regression; Regression; EFFORT PREDICTION;

D O I：

10.1016/j.infsof.2010.05.009

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Context: In software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy. Objective: This paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates. Method: Simulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models. Results: In all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis. Conclusions: The combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：1155 / 1166

页数：12

共 50 条

[31] GA-Based Feature Selection Method for Imbalanced Data with Application in Radio Signal Recognition
Limin Du
Yang Xu
Jun Liu
Fangli Ma
[J]. International Journal of Computational Intelligence Systems, 2015, 8 : 39 - 47
[32] GA-Based Feature Selection Method for Imbalanced Data with Application in Radio Signal Recognition
Du, Limin
Xu, Yang
Liu, Jun
Ma, Fangli
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 : 39 - 47
[33] A GA-Based Approach to ICA Feature Selection: An Efficient Method to Classify Microarray Datasets
Liu, Kun-Hong
Zhang, Jun
Li, Bo
Du, Ji-Xiang
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 2, PROCEEDINGS, 2009, 5552 : 432 - +
[34] Combined optimization of feature selection and algorithm parameters in machine learning of language
Daelemans, W
Hoste, V
De Meulder, F
Naudts, B
[J]. MACHINE LEARNING: ECML 2003, 2003, 2837 : 84 - 95
[35] A Two-Stage GA-Based sEMG Feature Selection Method for User-Independent Continuous Estimation of Elbow Angles
Li, He
Guo, Shuxiang
Bu, Dongdong
Wang, Hanze
[J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[36] Review and Empirical Analysis of Machine Learning-Based Software Effort Estimation
Rahman, Mizanur
Sarwar, Hasan
Kader, MD. Abdul
Goncalves, Teresa
Tin, Ting Tin
[J]. IEEE ACCESS, 2024, 12 : 85661 - 85680
[37] Improving Software Regression Testing Using a Machine Learning-Based Method for Test Type Selection
Al-Sabbagh, Khaled Walid
Staron, Miroslaw
Hebig, Regina
[J]. PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2022, 2022, 13709 : 480 - 496
[38] A Machine Learning-Based Wrapper Method for Feature Selection
Patel, Damodar
Saxena, Amit
Wang, John
[J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2024, 20 (01)
[39] Incorporating statistical and machine learning techniques into the optimization of correction factors for software development effort estimation
Nhung, Ho Le Thi Kim
Van Hai, Vo
Silhavy, Petr
Prokopova, Zdenka
Silhavy, Radek
[J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2023,
[40] Incorporating statistical and machine learning techniques into the optimization of correction factors for software development effort estimation
Ho Le Thi Kim Nhung
Vo Van Hai
Silhavy, Petr
Prokopova, Zdenka
Silhavy, Radek
[J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024, 36 (05)

← 1 2 3 4 5 →