A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique

被引:0
|
作者
Rajwant Singh Rao
Seema Dewangan
Alok Mishra
Manjari Gupta
机构
[1] Guru Ghasidas Vishwavidyalaya,Department of Computer Science and Information Technology
[2] Norwegian University of Science and Technology,Faculty of Engineering
[3] Banaras Hindu University,(Computer Science), DST
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Detecting code smells may be highly helpful for reducing maintenance costs and raising source code quality. Code smells facilitate developers or researchers to understand several types of design flaws. Code smells with high severity can cause significant problems for the software and may cause challenges for the system's maintainability. It is quite essential to assess the severity of the code smells detected in software, as it prioritizes refactoring efforts. The class imbalance problem also further enhances the difficulties in code smell severity detection. In this study, four code smell severity datasets (Data class, God class, Feature envy, and Long method) are selected to detect code smell severity. In this work, an effort is made to address the issue of class imbalance, for which, the Synthetic Minority Oversampling Technique (SMOTE) class balancing technique is applied. Each dataset's relevant features are chosen using a feature selection technique based on principal component analysis. The severity of code smells is determined using five machine learning techniques: K-nearest neighbor, Random forest, Decision tree, Multi-layer Perceptron, and Logistic Regression. This study obtained the 0.99 severity accuracy score with the Random forest and Decision tree approach with the Long method code smell. The model's performance is compared based on its accuracy and three other performance measurements (Precision, Recall, and F-measure) to estimate severity classification models. The impact of performance is also compared and presented with and without applying SMOTE. The results obtained in the study are promising and can be beneficial for paving the way for further studies in this area.
引用
收藏
相关论文
共 50 条
  • [21] Heart Diseases Prediction for Optimization based Feature Selection and Classification using Machine Learning Methods
    Rajinikanth, N.
    Pavithra, L.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 636 - 643
  • [22] Classification of lung cancer using ensemble-based feature selection and machine learning methods
    Cai, Zhihua
    Xu, Dong
    Zhang, Qing
    Zhang, Jiexia
    Ngai, Sai-Ming
    Shao, Jianlin
    MOLECULAR BIOSYSTEMS, 2015, 11 (03) : 791 - 800
  • [23] Detection of interferences in an additive manufacturing process: an experimental study integrating methods of feature selection and machine learning
    Stanisavljevic, Darko
    Cemernek, David
    Gursch, Heimo
    Urak, Guenter
    Lechner, Gernot
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2020, 58 (09) : 2862 - 2884
  • [24] A machine learning based credit card fraud detection using the GA algorithm for feature selection
    Ileberi, Emmanuel
    Sun, Yanxia
    Wang, Zenghui
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [25] Android Malware Detection Using Genetic Algorithm based Optimized Feature Selection and Machine Learning
    Fatima, Anam
    Maurya, Ritesh
    Dutta, Malay Kishore
    Burget, Radim
    Masek, Jan
    2019 42ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2019, : 220 - 223
  • [26] A machine learning based credit card fraud detection using the GA algorithm for feature selection
    Emmanuel Ileberi
    Yanxia Sun
    Zenghui Wang
    Journal of Big Data, 9
  • [27] Optimizing IoT Intrusion Detection Using Balanced Class Distribution, Feature Selection, and Ensemble Machine Learning Techniques
    Musthafa, Muhammad Bisri
    Huda, Samsul
    Kodera, Yuta
    Ali, Md. Arshad
    Araki, Shunsuke
    Mwaura, Jedidah
    Nogami, Yasuyuki
    SENSORS, 2024, 24 (13)
  • [28] A Novel Study: GAN-Based Minority Class Balancing and Machine-Learning-Based Network Intruder Detection Using Chi-Square Feature Selection
    Alabrah, Amerah
    APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [29] Multi-class SVM based network intrusion detection with attribute selection using infinite feature selection technique
    Kaushik, Ruchi
    Singh, Vijander
    Kumar, Rajani
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2021, 24 (08): : 2137 - 2153
  • [30] Automatic colorectal cancer detection using machine learning and deep learning based on feature selection in histopathological images
    Junaid, Hawkar Haji Said
    Daneshfar, Fatemeh
    Mohammad, Mahmud Abdulla
    Biomedical Signal Processing and Control, 2025, 107