Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection

被引:38
|
作者
Jain, Shivani [1 ]
Saha, Anju [1 ]
机构
[1] GGS Indraprastha Univ, USIC&T, Sect 16 C, Delhi 110078, India
关键词
Code smell; Machine learning; Ensemble machine learning; Hybrid feature selection; Stacking; CLASSIFIER; REGRESSION; DESIGN;
D O I
10.1016/j.scico.2021.102713
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Maintaining large and complex software is a significant task in IT industry. One reason for that is the development of code smells which are design flaws that lead to future bugs and errors. Code smells can be treated with regular refactoring, and their detection is the first step in the software maintenance process. Detecting code smells with machine learning algorithms eliminate the need of extensive knowledge required regarding properties of code smell and threshold values. Ensemble machine learning algorithms use a combination of several same or different classifiers to further aid the performance and reduces the variance. In our study, three hybrid feature selection techniques with ensemble machine learning algorithms are employed to improve the performance in detecting code smells. Seven machine learning classifiers with different kernel variations, along with three boosting designs, two stacking methods, and bagging were implemented. For feature selection, combination of filter-wrapper, filter-embedded, and wrapper-embedded methods have been executed. Performance measures for detecting four code smells are evaluated and are compared with the performance when feature selection is not employed. It is found out that performance measure after application of hybrid feature selection increased, accuracy by 21.43%, AUC value by 53.24%, and f-measure by 76.06%. Univariate ROC with Lasso is the best hybrid feature selection technique with 90.48% accuracy and 94.5% ROC AUC value. Random Forest and Logistic regression are the best performing machine learning classifiers. Data class is most detectable code smell. Stacking always gave better results when compared with individual classifiers. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] Machine learning techniques for code smell detection: A systematic literature review and meta-analysis
    Azeem, Muhammad Ilyas
    Palomba, Fabio
    Shi, Lin
    Wang, Qing
    INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 108 : 115 - 138
  • [22] Improving the Performance of Machine Learning with Sequential Feature Selection and Grid Search
    Assegie, Tsehay Admassu
    Murugan, Sangeetha
    Govindarajan, Rajkumar
    Napa, Komal Kumar
    D, D.
    PRZEGLAD ELEKTROTECHNICZNY, 2024, 100 (07): : 140 - 143
  • [23] Android malware detection applying feature selection techniques and machine learning
    Keyvanpour, Mohammad Reza
    Shirzad, Mehrnoush Barani
    Heydarian, Farideh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (06) : 9517 - 9531
  • [24] Android malware detection applying feature selection techniques and machine learning
    Mohammad Reza Keyvanpour
    Mehrnoush Barani Shirzad
    Farideh Heydarian
    Multimedia Tools and Applications, 2023, 82 : 9517 - 9531
  • [25] Review on intrusion detection using feature selection with machine learning techniques
    Kalimuthan, C.
    Renjit, J. Arokia
    MATERIALS TODAY-PROCEEDINGS, 2020, 33 : 3794 - 3802
  • [26] Performance of Machine Learning Techniques in Anomaly Detection with Basic Feature Selection Strategy - A Network Intrusion Detection System
    Pranto, Md Badiuzzaman
    Ratul, Md Hasibul Alam
    Rahman, Md Mahidur
    Diya, Ishrat Jahan
    Zahir, Zunayeed-Bin
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (01) : 36 - 44
  • [27] Lightweight Intrusion Detection Based on Hybrid Feature Selection Machine Learning
    Xia, Guoxin
    Zhao, Yanqiao
    Han, Chaohui
    Zhao, Xiaosong
    Zhang, Lei
    39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 1392 - 1395
  • [28] Revisiting "code smell severity classification using machine learning techniques"
    Hu, Wenhua
    Liu, Lei
    Yang, Peixin
    Zou, Kuan
    Li, Jiajun
    Lin, Guancheng
    Xiang, Jianwen
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 840 - 849
  • [29] AI-enabled approach for enhancing obfuscated malware detection: a hybrid ensemble learning with combined feature selection techniques
    Hossain, Md. Alamgir
    Haque, Md Alimul
    Ahmad, Sultan
    Abdeljaber, Hikmat A. M.
    Eljialy, A. E. M.
    Alanazi, Abed
    Sonal, Deepa
    Chaudhary, Kiran
    Nazeer, Jabeen
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024,
  • [30] Impact of Feature Selection Techniques on the Performance of Machine Learning Models for Depression Detection Using EEG Data
    Hassan, Marwa
    Kaabouch, Naima
    APPLIED SCIENCES-BASEL, 2024, 14 (22):