Analysis of SMOTE: Modified for Diverse Imbalanced Datasets Under the IoT Environment

被引:3
|
作者
Bansal, Ankita [1 ]
Saini, Makul [1 ]
Singh, Rakshit [1 ]
Yadav, Jai Kumar [1 ]
机构
[1] Netaji Subhas Univ Technol, Delhi, India
关键词
Class Imbalance Problem; Confusion Matrix; Data Sampling; Fraud Detection; Machine Learning Classifiers; Oversampling; Random Oversampling; Undersampling; CLASSIFICATION;
D O I
10.4018/IJIRR.2021040102
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous amount of data generated through IoT can be imbalanced causing class imbalance problem (CIP). CIP is one of the major issues in machine learning where most of the samples belong to one of the classes, thus producing biased classifiers. The authors in this paper are working on four imbalanced datasets belonging to diverse domains. The objective of this study is to deal with CIP using oversampling techniques. One of the commonly used oversampling approaches is synthetic minority oversampling technique (SMOTE). In this paper, the authors have suggested modifications in SMOTE and proposed their own algorithm, SMOTE-modified (SMOTE-M). To provide a fair evaluation, it is compared with three oversampling approaches, SMOTE, adaptive synthetic oversampling (ADASYN), and SMOTE-Adaboost. To evaluate the performances of sampling approaches, models are constructed using four classifiers (K-nearest neighbour, decision tree, naive Bayes, logistic regression) on balanced and imbalanced datasets. The study shows that the results of SMOTE-M are comparable to that of ADASYN and SMOTE-Adaboost.
引用
收藏
页码:15 / 37
页数:23
相关论文
共 50 条
  • [1] A Modified Borderline Smote with Noise Reduction in Imbalanced Datasets
    Revathi, M.
    Ramyachitra, D.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2021, 121 (03) : 1659 - 1680
  • [2] A Modified Borderline Smote with Noise Reduction in Imbalanced Datasets
    M. Revathi
    D. Ramyachitra
    [J]. Wireless Personal Communications, 2021, 121 : 1659 - 1680
  • [3] An Investigation of SMOTE Based Methods for Imbalanced Datasets With Data Complexity Analysis
    Azhar, Nur Athirah
    Pozi, Muhammad Syafiq Mohd
    Din, Aniza Mohamed
    Jatowt, Adam
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 6651 - 6672
  • [4] PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets
    Chen, Qiong
    Zhang, Zhong-Liang
    Huang, Wen-Po
    Wu, Jian
    Luo, Xing-Gang
    [J]. NEUROCOMPUTING, 2022, 498 : 75 - 88
  • [5] A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE
    Ahmed Saad Hussein
    Tianrui Li
    Chubato Wondaferaw Yohannese
    Kamal Bashir
    [J]. International Journal of Computational Intelligence Systems, 2019, 12 : 1412 - 1422
  • [6] A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE
    Hussein, Ahmed Saad
    Li, Tianrui
    Yohannese, Chubato Wondaferaw
    Bashir, Kamal
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2019, 12 (02) : 1412 - 1422
  • [7] Geometric SMOTE for imbalanced datasets with nominal and continuous features
    Fonseca, Joao
    Bacao, Fernando
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [8] Learning imbalanced datasets based on SMOTE and Gaussian distribution
    Pan, Tingting
    Zhao, Junhong
    Wu, Wei
    Yang, Jie
    [J]. INFORMATION SCIENCES, 2020, 512 : 1214 - 1233
  • [9] Kernel-Based SMOTE for SVM Classification of Imbalanced Datasets
    Mathew, Josey
    Luo, Ming
    Pang, Chee Khiang
    Chan, Hian Leng
    [J]. IECON 2015 - 41ST ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2015, : 1127 - 1132
  • [10] Applying Threshold SMOTE Algorithm with Attribute Bagging to Imbalanced Datasets
    Wang, Jin
    Yun, Bo
    Huang, Pingli
    Liu, Yu-Ao
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY: 8TH INTERNATIONAL CONFERENCE, 2013, 8171 : 221 - 228