Performance Analysis of Machine Learning Algorithms on Imbalanced Datasets Using SMOTE Technique

被引:0
|
作者
Kumar, Bala Santhosh [1 ]
Yadav, Pasupula Praveen [1 ]
Prasad, P. Penchala [1 ]
机构
[1] G Pulla Reddy Engn Coll, Comp Sci & Engn Dept, Kurnool, India
关键词
Machine Learning; SMOTE; Accuracy;
D O I
10.1007/978-981-97-8031-0_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research paper aims to investigate the impact of using the Synthetic Minority Over-Sampling Technique (SMOTE) on the performance of several machine learning algorithms on imbalanced dataset. Imbalanced datasets are a common problem in many real-world applications, where one class is much more prevalent than the other class. This imbalance can lead to biased models, where the majority class dominates the model's predictions, and the minority class is often misclassified. To address this problem, we applied the SMOTE algorithm to generate synthetic data for the minority class. We evaluated the performance of several popular machine learning algorithms including logistic regression, decision trees, ensemble learning, support vector machines, Neural networks and Auto ML approach on both the original imbalanced dataset and the SMOTE-augmented dataset. The experimental results demonstrate that using SMOTE significantly improves the accuracy of the machine learning algorithms on imbalanced datasets. In conclusion, our research highlights the importance of considering the impact of imbalanced datasets on machine learning algorithm's performance and demonstrates the effectiveness of SMOTE in addressing this issue. Our results can be useful to practitioners working on imbalanced datasets to choose an appropriate machine-learning algorithm and to decide whether to use SMOTE to improve their model's performance.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [41] Recurrent Stroke Prediction using Machine Learning Algorithms with Clinical Public Datasets: An Empirical Performance Evaluation
    Hassan, Fadratul Hafinaz
    Omar, Mohd Adib
    BAGHDAD SCIENCE JOURNAL, 2021, 18 (04) : 1406 - 1412
  • [42] Diabetes Prediction using SMOTE and Machine Learning
    Sarayu, Maganti Khyathi
    Bhanu, Shaik Ayesha
    Deekshitha, Karanam
    Meghana, Maduri
    Joseph, Iwin Thanakumar
    2024 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS, ICICI 2024, 2024, : 15 - 20
  • [43] Performance Analysis of Digit Recognizer Using Various Machine Learning Algorithms
    Chittem, Lakshmi Alekya
    Logofatu, Doina
    Mim, Sheikh Sharfuddin
    24TH INTERNATIONAL CONFERENCE ON ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2023, 2023, 1826 : 340 - 351
  • [44] Experimental Performance Analysis of Machine Learning Algorithms
    Khekare, Ganesh
    Turukmane, Anil V.
    Dhule, Chetan
    Sharma, Pooja
    Kumar Bramhane, Lokesh
    Lecture Notes in Electrical Engineering, 2022, 942 LNEE : 1041 - 1052
  • [45] Detecting congestive heart failure by extracting multimodal features with synthetic minority oversampling technique (SMOTE) for imbalanced data using robust machine learning techniques
    Hussain, Lal
    Lone, Kashif Javed
    Awan, Imtiaz Ahmed
    Abbasi, Adeel Ahmed
    Pirzada, Jawad-ur-Rehman
    WAVES IN RANDOM AND COMPLEX MEDIA, 2022, 32 (03) : 1079 - 1102
  • [46] Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms
    Jinyan Li
    Simon Fong
    Sabah Mohammed
    Jinan Fiaidhi
    The Journal of Supercomputing, 2016, 72 : 3708 - 3728
  • [47] Performance of Machine Learning Algorithms for Class-Imbalanced Process Fault Detection Problems
    Lee, Taehyung
    Lee, Ki Bum
    Kim, Chang Ouk
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2016, 29 (04) : 436 - 445
  • [48] Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms
    Li, Jinyan
    Fong, Simon
    Mohammed, Sabah
    Fiaidhi, Jinan
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (10): : 3708 - 3728
  • [49] A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    Fan, Zhu
    INFORMATION SCIENCES, 2021, 565 : 438 - 455
  • [50] Unsupervised machine learning for disease prediction: a comparative performance analysis using multiple datasets
    Haohui Lu
    Shahadat Uddin
    Health and Technology, 2024, 14 : 141 - 154