Analysis of SMOTE: Modified for Diverse Imbalanced Datasets Under the IoT Environment

被引:3
|
作者
Bansal, Ankita [1 ]
Saini, Makul [1 ]
Singh, Rakshit [1 ]
Yadav, Jai Kumar [1 ]
机构
[1] Netaji Subhas Univ Technol, Delhi, India
关键词
Class Imbalance Problem; Confusion Matrix; Data Sampling; Fraud Detection; Machine Learning Classifiers; Oversampling; Random Oversampling; Undersampling; CLASSIFICATION;
D O I
10.4018/IJIRR.2021040102
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous amount of data generated through IoT can be imbalanced causing class imbalance problem (CIP). CIP is one of the major issues in machine learning where most of the samples belong to one of the classes, thus producing biased classifiers. The authors in this paper are working on four imbalanced datasets belonging to diverse domains. The objective of this study is to deal with CIP using oversampling techniques. One of the commonly used oversampling approaches is synthetic minority oversampling technique (SMOTE). In this paper, the authors have suggested modifications in SMOTE and proposed their own algorithm, SMOTE-modified (SMOTE-M). To provide a fair evaluation, it is compared with three oversampling approaches, SMOTE, adaptive synthetic oversampling (ADASYN), and SMOTE-Adaboost. To evaluate the performances of sampling approaches, models are constructed using four classifiers (K-nearest neighbour, decision tree, naive Bayes, logistic regression) on balanced and imbalanced datasets. The study shows that the results of SMOTE-M are comparable to that of ADASYN and SMOTE-Adaboost.
引用
收藏
页码:15 / 37
页数:23
相关论文
共 50 条
  • [21] A modified adaptive synthetic sampling method for learning imbalanced datasets
    Hussein, Ahmed Saad
    Li, Tianrui
    Abd Ali, Doaa Mohsin
    Bashir, Kamal
    Yohannese, Chubato Wondaferaw
    [J]. DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 76 - 83
  • [22] Under-sampling class imbalanced datasets by combining clustering analysis and instance selection
    Tsai, Chih-Fong
    Lin, Wei-Chao
    Hu, Ya-Han
    Yao, Guan-Ting
    [J]. INFORMATION SCIENCES, 2019, 477 : 47 - 54
  • [23] Software fault prediction with imbalanced datasets using SMOTE-Tomek sampling technique and Genetic Algorithm models
    Gupta, Mansi
    Rajnish, Kumar
    Bhattacharjee, Vandana
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 47627 - 47648
  • [24] Software fault prediction with imbalanced datasets using SMOTE-Tomek sampling technique and Genetic Algorithm models
    Mansi Gupta
    Kumar Rajnish
    Vandana Bhattacharjee
    [J]. Multimedia Tools and Applications, 2024, 83 : 47627 - 47648
  • [25] Classification of Datasets Used in Data Anonymization for IoT Environment
    Medkova, Jana
    [J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, IEA-AIE 2024, 2024, 14748 : 80 - 92
  • [26] Effect of De-noising by Wavelet Filtering and Data Augmentation by Borderline SMOTE on the Classification of Imbalanced Datasets of Pig Behavior
    Jin, Min
    Wang, Chunguang
    Jensen, Dan Borge
    [J]. FRONTIERS IN ANIMAL SCIENCE, 2021, 2
  • [27] A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
    Elreedy, Dina
    Atiya, Amir F.
    Kamalov, Firuz
    [J]. MACHINE LEARNING, 2024, 113 (07) : 4903 - 4923
  • [28] Variable Importance Analysis in Imbalanced Datasets: A New Approach
    Ahrazem Dfuf, Ismael
    Forte Perez-Minayo, Joaquin
    Mira Mcwilliams, Jose Manuel
    Gonzalez Fernandez, Camino
    [J]. IEEE ACCESS, 2020, 8 : 127404 - 127430
  • [29] SSOMaj-SMOTE-SSOMin: Three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets
    Susan, Seba
    Kumar, Amitesh
    [J]. APPLIED SOFT COMPUTING, 2019, 78 : 141 - 149
  • [30] A Comparative Analysis of Classification Algorithms on Diverse Datasets
    Alghobiri, Muhammad
    [J]. ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (02) : 2790 - 2795