Performance Analysis of Machine Learning Algorithms on Imbalanced Datasets Using SMOTE Technique

被引:0
|
作者
Kumar, Bala Santhosh [1 ]
Yadav, Pasupula Praveen [1 ]
Prasad, P. Penchala [1 ]
机构
[1] G Pulla Reddy Engn Coll, Comp Sci & Engn Dept, Kurnool, India
关键词
Machine Learning; SMOTE; Accuracy;
D O I
10.1007/978-981-97-8031-0_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research paper aims to investigate the impact of using the Synthetic Minority Over-Sampling Technique (SMOTE) on the performance of several machine learning algorithms on imbalanced dataset. Imbalanced datasets are a common problem in many real-world applications, where one class is much more prevalent than the other class. This imbalance can lead to biased models, where the majority class dominates the model's predictions, and the minority class is often misclassified. To address this problem, we applied the SMOTE algorithm to generate synthetic data for the minority class. We evaluated the performance of several popular machine learning algorithms including logistic regression, decision trees, ensemble learning, support vector machines, Neural networks and Auto ML approach on both the original imbalanced dataset and the SMOTE-augmented dataset. The experimental results demonstrate that using SMOTE significantly improves the accuracy of the machine learning algorithms on imbalanced datasets. In conclusion, our research highlights the importance of considering the impact of imbalanced datasets on machine learning algorithm's performance and demonstrates the effectiveness of SMOTE in addressing this issue. Our results can be useful to practitioners working on imbalanced datasets to choose an appropriate machine-learning algorithm and to decide whether to use SMOTE to improve their model's performance.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [21] Ecg Classification using Machine Learning Techniques and Smote Oversampling Technique
    Zhong, Zhang Xing
    Michael, Akotonou J.
    Lun, Zhao Jie
    Yue, Dong Hong
    PROCEEDINGS OF 2020 2ND INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MACHINE VISION AND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND MACHINE LEARNING, IPMV 2020, 2020, : 10 - 13
  • [22] Interpretable machine learning for imbalanced credit scoring datasets
    Chen, Yujia
    Calabrese, Raffaella
    Martin-Barragan, Belen
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (01) : 357 - 372
  • [23] Epileptic Seizure Detection for Imbalanced Datasets Using an Integrated Machine Learning Approach
    Masum, Mohammad
    Shahriar, Hossain
    Haddad, Hisham M.
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 5416 - 5419
  • [24] Machine Learning with Imbalanced EEG Datasets using Outlier-based Sampling
    Islah, Nizar
    Koerner, Jamie
    Genov, Roman
    Valiante, Taufik A.
    O'Leary, Gerard
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 112 - 115
  • [25] Analysis of three intrusion detection system benchmark datasets using machine learning algorithms
    Kayacik, HG
    Zincir-Heywood, N
    INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2005, 3495 : 362 - 367
  • [26] Performance Analysis on Student Feedback using Machine Learning Algorithms
    Katragadda, Sharnitha
    Ravi, Varshitha
    Kumar, Prasanna
    Lakshmi, G. Jaya
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 1161 - 1163
  • [27] Adaptive Weighting with SMOTE for Learning from Imbalanced Datasets: A Case Study for Traffic Offence Prediction
    Bobbili, Naga Prasanthi
    Cretu, Ana-Maria
    2018 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (CIVEMSA), 2018,
  • [28] Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection
    Verbiest, Nele
    Ramentol, Enislay
    Cornelis, Chris
    Herrera, Francisco
    APPLIED SOFT COMPUTING, 2014, 22 : 511 - 517
  • [29] Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints
    Bae, Su-Yong
    Lee, Jonga
    Jeong, Jaeseong
    Lim, Changwon
    Choi, Jinhee
    COMPUTATIONAL TOXICOLOGY, 2021, 20
  • [30] Performance and model complexity on imbalanced datasets using resampling and cost-sensitive algorithms
    Freitas Junior, Jairo da Silva
    Pisani, Paulo Henrique
    FOURTH INTERNATIONAL WORKSHOP ON LEARNING WITH IMBALANCED DOMAINS: THEORY AND APPLICATIONS, VOL 183, 2022, 183 : 83 - 97