Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection

被引:38
|
作者
Jain, Shivani [1 ]
Saha, Anju [1 ]
机构
[1] GGS Indraprastha Univ, USIC&T, Sect 16 C, Delhi 110078, India
关键词
Code smell; Machine learning; Ensemble machine learning; Hybrid feature selection; Stacking; CLASSIFIER; REGRESSION; DESIGN;
D O I
10.1016/j.scico.2021.102713
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Maintaining large and complex software is a significant task in IT industry. One reason for that is the development of code smells which are design flaws that lead to future bugs and errors. Code smells can be treated with regular refactoring, and their detection is the first step in the software maintenance process. Detecting code smells with machine learning algorithms eliminate the need of extensive knowledge required regarding properties of code smell and threshold values. Ensemble machine learning algorithms use a combination of several same or different classifiers to further aid the performance and reduces the variance. In our study, three hybrid feature selection techniques with ensemble machine learning algorithms are employed to improve the performance in detecting code smells. Seven machine learning classifiers with different kernel variations, along with three boosting designs, two stacking methods, and bagging were implemented. For feature selection, combination of filter-wrapper, filter-embedded, and wrapper-embedded methods have been executed. Performance measures for detecting four code smells are evaluated and are compared with the performance when feature selection is not employed. It is found out that performance measure after application of hybrid feature selection increased, accuracy by 21.43%, AUC value by 53.24%, and f-measure by 76.06%. Univariate ROC with Lasso is the best hybrid feature selection technique with 90.48% accuracy and 94.5% ROC AUC value. Random Forest and Logistic regression are the best performing machine learning classifiers. Data class is most detectable code smell. Stacking always gave better results when compared with individual classifiers. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:34
相关论文
共 50 条
  • [31] Improving Machine Learning-based Code Smell Detection via Hyper-parameter Optimization
    Shen, Lei
    Liu, Wangshu
    Chen, Xiang
    Gu, Qing
    Liu, Xuejun
    2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020), 2020, : 276 - 285
  • [32] Voting Heterogeneous Ensemble for Code Smell Detection
    Aljamaan, Hamoud
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 897 - 902
  • [33] Network Intrusion Detection and Comparative Analysis Using Ensemble Machine Learning and Feature Selection
    Das, Saikat
    Saha, Sajal
    Priyoti, Annita Tahsin
    Roy, Etee Kawna
    Sheldon, Frederick T. T.
    Haque, Anwar
    Shiva, Sajjan
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (04): : 4821 - 4833
  • [34] Effective Feature Selection for Hybrid Wireless IoT Network Intrusion Detection Systems Using Machine Learning Techniques
    Nivaashini, M.
    Thangaraj, P.
    Sountharrajan, S.
    Suganya, E.
    Soundariya, R.
    AD HOC & SENSOR WIRELESS NETWORKS, 2021, 49 (3-4) : 175 - 206
  • [35] Boosting and Comparing Performance of Machine Learning Classifiers with Meta-heuristic Techniques to Detect Code Smell
    Jain, Shivani
    Saha, Anju
    E-INFORMATICA SOFTWARE ENGINEERING JOURNAL, 2024, 18 (01)
  • [36] Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning
    Gupta, Anamika
    Thakkar, Khushboo
    Bhasin, Veenu
    Tiwari, Aman
    Mathur, Vibhor
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 1135 - 1143
  • [37] A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
    Khalid, Samina
    Khalil, Tehmina
    Nasreen, Shamila
    2014 SCIENCE AND INFORMATION CONFERENCE (SAI), 2014, : 372 - 378
  • [38] Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning
    Alsaffar, Ali Mohammed
    Nouri-Baygi, Mostafa
    Zolbanin, Hamed M.
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [39] Improving Pseudo-code Detection in Ubiquitous Scholarly Data Using Ensemble Machine Learning
    Tuarob, Suppawong
    2016 20TH INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2016,
  • [40] Machine Learning-Based Methods for Code Smell Detection: A Survey
    Yadav, Pravin Singh
    Rao, Rajwant Singh
    Mishra, Alok
    Gupta, Manjari
    APPLIED SCIENCES-BASEL, 2024, 14 (14):