Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection

被引：38

作者：

Jain, Shivani ^{[1
]}

Saha, Anju ^{[1
]}

机构：

[1] GGS Indraprastha Univ, USIC&T, Sect 16 C, Delhi 110078, India

来源：

SCIENCE OF COMPUTER PROGRAMMING | 2021年 / 212卷

关键词：

Code smell; Machine learning; Ensemble machine learning; Hybrid feature selection; Stacking; CLASSIFIER; REGRESSION; DESIGN;

D O I：

10.1016/j.scico.2021.102713

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Maintaining large and complex software is a significant task in IT industry. One reason for that is the development of code smells which are design flaws that lead to future bugs and errors. Code smells can be treated with regular refactoring, and their detection is the first step in the software maintenance process. Detecting code smells with machine learning algorithms eliminate the need of extensive knowledge required regarding properties of code smell and threshold values. Ensemble machine learning algorithms use a combination of several same or different classifiers to further aid the performance and reduces the variance. In our study, three hybrid feature selection techniques with ensemble machine learning algorithms are employed to improve the performance in detecting code smells. Seven machine learning classifiers with different kernel variations, along with three boosting designs, two stacking methods, and bagging were implemented. For feature selection, combination of filter-wrapper, filter-embedded, and wrapper-embedded methods have been executed. Performance measures for detecting four code smells are evaluated and are compared with the performance when feature selection is not employed. It is found out that performance measure after application of hybrid feature selection increased, accuracy by 21.43%, AUC value by 53.24%, and f-measure by 76.06%. Univariate ROC with Lasso is the best hybrid feature selection technique with 90.48% accuracy and 94.5% ROC AUC value. Random Forest and Logistic regression are the best performing machine learning classifiers. Data class is most detectable code smell. Stacking always gave better results when compared with individual classifiers. (C) 2021 Elsevier B.V. All rights reserved.

引用

页数：34

共 50 条

[1] Improving Code Smell Detection by Reducing Dimensionality Using Ensemble Feature Selection and Machine Learning
Nandini A.
Singh R.
Rathee A.
SN Computer Science, 5 (6)
[2] Code Smell Detection Using Ensemble Machine Learning Algorithms
Dewangan, Seema
Rao, Rajwant Singh
Mishra, Alok
Gupta, Manjari
APPLIED SCIENCES-BASEL, 2022, 12 (20):
[3] Code smell detection using feature selection and stacking ensemble: An empirical investigation
Alazba, Amal
Aljamaan, Hamoud
INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 138
[4] Machine Learning and Ensemble Learning Techniques for Intrusion Detection Systems: A Performance Analysis Based on Feature Selection Methods
Basarslan, Muhammet Sinan
Turgut, Zeynep
INTELLIGENT AND FUZZY SYSTEMS, VOL 3, INFUS 2024, 2024, 1090 : 117 - 124
[5] Comparing and experimenting machine learning techniques for code smell detection
Francesca Arcelli Fontana
Mika V. Mäntylä
Marco Zanoni
Alessandro Marino
Empirical Software Engineering, 2016, 21 : 1143 - 1191
[6] Comparing and experimenting machine learning techniques for code smell detection
Fontana, Francesca Arcelli
Mantyla, Mika V.
Zanoni, Marco
Marino, Alessandro
EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (03) : 1143 - 1191
[7] Rank-based univariate feature selection methods on machine learning classifiers for code smell detection
Shivani Jain
Anju Saha
Evolutionary Intelligence, 2022, 15 : 609 - 638
[8] Rank-based univariate feature selection methods on machine learning classifiers for code smell detection
Jain, Shivani
Saha, Anju
EVOLUTIONARY INTELLIGENCE, 2022, 15 (01) : 609 - 638
[9] A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection
Kaur, Inderpreet
Kaur, Arvinder
IEEE ACCESS, 2021, 9 : 8695 - 8707
[10] Improving and comparing performance of machine learning classifiers optimized by swarm intelligent algorithms for code smell detection
Jain, Shivani
Saha, Anju
SCIENCE OF COMPUTER PROGRAMMING, 2024, 237

← 1 2 3 4 5 →