Review and Empirical Analysis of Machine Learning-Based Software Effort Estimation

被引：0

作者：

Rahman, Mizanur ^{[1
]}

Sarwar, Hasan ^{[2
]}

Kader, MD. Abdul ^{[3
]}

Goncalves, Teresa ^{[4
]}

Tin, Ting Tin ^{[5
]}

机构：

[1] Western Illinois Univ, Sch Comp Sci, Macomb, IL 61455 USA

[2] United Int Univ, Dept Comp Sci & Engn, Dhaka 1212, Bangladesh

[3] Univ Malaysia Pahang Al Sultan Abdullah, Fac Comp, Pekan 26600, Malaysia

[4] Univ Evora, Dept Informat, P-7004516 Evora, Portugal

[5] INTI Int Univ, Fac Data Sci & Informat Technol, Nilai 71800, Malaysia

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Estimation; Machine learning algorithms; Software reliability; Software algorithms; Research and development; Software development management; Linear regression; Support vector machines; Random forests; Software effort estimation; software development efforts estimation; linear regression; support vector machine; random forest; LASSO; KNN; R&D investment;

D O I：

10.1109/ACCESS.2024.3404879

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The average software company spends a huge amount of its revenue on Research and Development (R&D) for how to deliver software on time. Accurate software effort estimation is critical for successful project planning, resource allocation, and on-time delivery within budget for sustainable software development. However, both overestimation and underestimation can pose significant challenges, highlighting the need for continuous improvement in estimation techniques. This study reviews recent machine learning approaches employed to enhance the accuracy of software effort estimation (SEE), focusing on research published between 2020 and 2023. The literature review employed a systematic approach to identify relevant research on machine learning techniques for SEE. Additionally, comparative experiments were conducted using five commonly employed Machine Learning (ML) methods: K-Nearest Neighbor, Support Vector Machine, Random Forest, Logistic Regression, and LASSO Regression. The performance of these techniques was evaluated using five widely adopted accuracy metrics: Mean Squared Error (MSE), Mean Magnitude of Relative Error (MMRE), R-squared, Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The evaluation was carried out on seven benchmark datasets: Albrecht, Desharnais, China, Kemerer, Mayazaki94, Maxwell, and COCOMO, which are publicly available and extensively used in SEE research. By carefully reviewing study quality, analyzing results across the literature, and rigorously evaluating experimental outcomes, clear conclusions were drawn about the most promising techniques for achieving state-of-the-art accuracy in estimating software effort. This study makes three key contributions to the field: firstly, it furnishes a thorough overview of recent machine learning research in software effort estimation (SEE); secondly, it provides data-driven guidance for researchers and practitioners to select optimal methods for accurate effort estimation; and thirdly, it demonstrates the performance of publicly available datasets through experimental analysis. Enhanced estimation supports the development of better predictive models for software project time, cost, and staffing needs. The findings aim to guide future research directions and tool development toward the most accurate machine learning approaches for modelling software development effort, costs, and delivery schedules, ultimately contributing to more efficient and cost-effective software projects.

引用

下载

页码：85661 / 85680

页数：20

共 50 条

[1] Machine Learning-based Software Effort Estimation : An Analysis
Polkowski, Zdzislaw
Vora, Jayneel
Tanwar, Sudeep
Tyagi, Sudhanshu
Singh, Pradeep Kumar
Singh, Yashwant
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2019), 2019,
[2] An empirical analysis of data preprocessing for machine learning-based software cost estimation
Huang, Jianglin
Li, Yan-Fu
Xie, Min
INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 67 : 108 - 127
[3] Systematic Review of Machine Learning-Based Open-Source Software Maintenance Effort Estimation
Miloudi C.
Cheikhi L.
Abran A.
Recent Advances in Computer Science and Communications, 2023, 16 (03)
[4] Systematic literature review of machine learning based software development effort estimation models
Wen, Jianfeng
Li, Shixian
Lin, Zhiyong
Hu, Yong
Huang, Changqin
INFORMATION AND SOFTWARE TECHNOLOGY, 2012, 54 (01) : 41 - 59
[5] An Extreme Learning Machine based Approach for Software Effort Estimation
Shukla, Suyash
Kumar, Sandeep
ENASE: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, 2021, : 47 - 57
[6] Performance tuning for machine learning-based software development effort prediction models
Ertugrul, Egemen
Baytar, Zakir
Catal, Cagatay
Muratli, Can
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (02) : 1308 - 1324
[7] Systematic Literature Review on Software Effort Estimation Using Machine Learning Approaches
Sharma, Pinkashia
Singh, Jaiteg
2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 43 - 47
[8] Review of machine learning-based Mineral Resource estimation
Mahoob, M. A.
Celik, T.
Genc, B.
JOURNAL OF THE SOUTHERN AFRICAN INSTITUTE OF MINING AND METALLURGY, 2022, 122 (11) : 655 - 664
[9] Software Effort Estimation using Machine Learning Techniques
Monika
Sangwan, Om Prakash
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), 2017, : 92 - 98
[10] Software effort estimation using machine learning methods
Baskeles, Bilge
Turhan, Burak
Bener, Ayse
2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 208 - 213

← 1 2 3 4 5 →