Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection

被引:38
|
作者
Jain, Shivani [1 ]
Saha, Anju [1 ]
机构
[1] GGS Indraprastha Univ, USIC&T, Sect 16 C, Delhi 110078, India
关键词
Code smell; Machine learning; Ensemble machine learning; Hybrid feature selection; Stacking; CLASSIFIER; REGRESSION; DESIGN;
D O I
10.1016/j.scico.2021.102713
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Maintaining large and complex software is a significant task in IT industry. One reason for that is the development of code smells which are design flaws that lead to future bugs and errors. Code smells can be treated with regular refactoring, and their detection is the first step in the software maintenance process. Detecting code smells with machine learning algorithms eliminate the need of extensive knowledge required regarding properties of code smell and threshold values. Ensemble machine learning algorithms use a combination of several same or different classifiers to further aid the performance and reduces the variance. In our study, three hybrid feature selection techniques with ensemble machine learning algorithms are employed to improve the performance in detecting code smells. Seven machine learning classifiers with different kernel variations, along with three boosting designs, two stacking methods, and bagging were implemented. For feature selection, combination of filter-wrapper, filter-embedded, and wrapper-embedded methods have been executed. Performance measures for detecting four code smells are evaluated and are compared with the performance when feature selection is not employed. It is found out that performance measure after application of hybrid feature selection increased, accuracy by 21.43%, AUC value by 53.24%, and f-measure by 76.06%. Univariate ROC with Lasso is the best hybrid feature selection technique with 90.48% accuracy and 94.5% ROC AUC value. Random Forest and Logistic regression are the best performing machine learning classifiers. Data class is most detectable code smell. Stacking always gave better results when compared with individual classifiers. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:34
相关论文
共 50 条
  • [41] A Hybrid Feature Selection Based Machine Learning Model for Detection of Motor Faults
    Jigyasu, Rajvardhan
    Kumar, Rahul
    Singh, Sachin
    2024 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS, AND CONTROL ENGINEERING, ICECC 2024, 2024, : 41 - 46
  • [42] Poster: Machine Learning based Code Smell Detection through WekaNose
    Azadi, Umberto
    Fontana, Francesca Arcelli
    Zanoni, Marco
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 288 - 289
  • [43] Unsupervised Machine Learning for Effective Code Smell Detection: A Novel Method
    Gupta, Ruchin
    Kumar, Narendra
    Kumar, Sunil
    Seth, Jitendra Kumar
    JOURNAL OF COMMUNICATIONS SOFTWARE AND SYSTEMS, 2024, 20 (04) : 307 - 316
  • [44] Code Smell Detection: Towards a Machine Learning-based Approach
    Fontana, Francesca Arcelli
    Zanoni, Marco
    Marino, Alessandro
    Mantyla, Mika V.
    2013 29TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2013, : 396 - 399
  • [45] Machine Learning Approaches for Code Smell Detection: A Systematic Literature Review
    Grujić, Katarina-Glorija
    Prokić, Simona
    Kovačević, Aleksandar
    Luburić, Nikola
    Vidaković, Dragan
    Slivka, Jelena
    SSRN, 2022,
  • [46] Machine Learning Ensemble Classifiers for Feature Selection in Rice Cultivars
    Thangavel, Chandrakumar
    Sakthipriya, D.
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [47] A Review on Feature Selection and Ensemble Techniques for Intrusion Detection System
    Torabi, Majid
    Udzir, Nur Izura
    Abdullah, Mohd Taufik
    Yaakob, Razali
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (05) : 538 - 553
  • [48] Improving accuracy of code smells detection using machine learning with data balancing techniques
    Khleel, Nasraldeen Alnor Adam
    Nehez, Karoly
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (14): : 21048 - 21093
  • [49] Performance Analysis of Anomaly-Based Network Intrusion Detection Using Feature Selection and Machine Learning Techniques
    Seniaray, Sumedha
    Jindal, Rajni
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 138 (04) : 2321 - 2351
  • [50] A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique
    Rao, Rajwant Singh
    Dewangan, Seema
    Mishra, Alok
    Gupta, Manjari
    SCIENTIFIC REPORTS, 2023, 13 (01)