Analysis of SMOTE: Modified for Diverse Imbalanced Datasets Under the IoT Environment

被引:3
|
作者
Bansal, Ankita [1 ]
Saini, Makul [1 ]
Singh, Rakshit [1 ]
Yadav, Jai Kumar [1 ]
机构
[1] Netaji Subhas Univ Technol, Delhi, India
关键词
Class Imbalance Problem; Confusion Matrix; Data Sampling; Fraud Detection; Machine Learning Classifiers; Oversampling; Random Oversampling; Undersampling; CLASSIFICATION;
D O I
10.4018/IJIRR.2021040102
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous amount of data generated through IoT can be imbalanced causing class imbalance problem (CIP). CIP is one of the major issues in machine learning where most of the samples belong to one of the classes, thus producing biased classifiers. The authors in this paper are working on four imbalanced datasets belonging to diverse domains. The objective of this study is to deal with CIP using oversampling techniques. One of the commonly used oversampling approaches is synthetic minority oversampling technique (SMOTE). In this paper, the authors have suggested modifications in SMOTE and proposed their own algorithm, SMOTE-modified (SMOTE-M). To provide a fair evaluation, it is compared with three oversampling approaches, SMOTE, adaptive synthetic oversampling (ADASYN), and SMOTE-Adaboost. To evaluate the performances of sampling approaches, models are constructed using four classifiers (K-nearest neighbour, decision tree, naive Bayes, logistic regression) on balanced and imbalanced datasets. The study shows that the results of SMOTE-M are comparable to that of ADASYN and SMOTE-Adaboost.
引用
收藏
页码:15 / 37
页数:23
相关论文
共 50 条
  • [41] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
    Luengo, Julian
    Fernandez, Alberto
    Garcia, Salvador
    Herrera, Francisco
    [J]. SOFT COMPUTING, 2011, 15 (10) : 1909 - 1936
  • [42] Lumos: Identifying and Localizing Diverse Hidden IoT Devices in an Unfamiliar Environment
    Sharma, Rahul Anand
    Soltanaghaei, Elahe
    Rowe, Anthony
    Sekar, Vyas
    [J]. PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 1095 - 1112
  • [43] Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models
    Dube, Lindani
    Verster, Tanja
    [J]. DATA SCIENCE IN FINANCE AND ECONOMICS, 2023, 3 (04): : 354 - 379
  • [44] Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review
    Lenka, Sudhansu R.
    Bisoy, Sukant Kishoro
    Priyadarshini, Rojalina
    Sain, Mangal
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [45] Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning
    Habbat, Nassera
    Nouri, Hicham
    Anoun, Houda
    Hassouni, Larbi
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [46] A Noisy-sample-removed Under-sampling Scheme for Imbalanced Classification of Public Datasets
    Zhu, Honghao
    Liu, Guanjun
    Zhou, Mengchu
    Xie, Yu
    Kang, Qi
    [J]. IFAC PAPERSONLINE, 2020, 53 (05): : 624 - 629
  • [47] Modified lightweight cryptography scheme and its applications in IoT environment
    Yasmin N.
    Gupta R.
    [J]. International Journal of Information Technology, 2023, 15 (8) : 4403 - 4414
  • [48] A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    [J]. 39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 2354 - 2359
  • [49] Analysis on Multi-Platform Intelligent Control System based on FPGA under IOT Environment
    Long, Chaoping
    Lu, Jianhua
    [J]. 3RD INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND TECHNOLOGY EDUCATION (ICSSTE 2017), 2017, : 716 - 722
  • [50] Toward the integration of datasets in the CRIS environment: A preliminary analysis
    Luzi, Daniela
    Di Cesare, Rosa
    Ruggieri, Roberta
    [J]. E-INFRASTRUCTURES FOR RESEARCH AND INNOVATION: LINKING INFORMATION SYSTEMS TO IMPROVE SCIENTIFIC KNOWLEDGE PRODUCTION, 2012, : 73 - 82