Analysis of SMOTE: Modified for Diverse Imbalanced Datasets Under the IoT Environment

被引：3

作者：

Bansal, Ankita ^{[1
]}

Saini, Makul ^{[1
]}

Singh, Rakshit ^{[1
]}

Yadav, Jai Kumar ^{[1
]}

机构：

[1] Netaji Subhas Univ Technol, Delhi, India

来源：

INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH | 2021年 / 11卷 / 02期

关键词：

Class Imbalance Problem; Confusion Matrix; Data Sampling; Fraud Detection; Machine Learning Classifiers; Oversampling; Random Oversampling; Undersampling; CLASSIFICATION;

D O I：

10.4018/IJIRR.2021040102

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The tremendous amount of data generated through IoT can be imbalanced causing class imbalance problem (CIP). CIP is one of the major issues in machine learning where most of the samples belong to one of the classes, thus producing biased classifiers. The authors in this paper are working on four imbalanced datasets belonging to diverse domains. The objective of this study is to deal with CIP using oversampling techniques. One of the commonly used oversampling approaches is synthetic minority oversampling technique (SMOTE). In this paper, the authors have suggested modifications in SMOTE and proposed their own algorithm, SMOTE-modified (SMOTE-M). To provide a fair evaluation, it is compared with three oversampling approaches, SMOTE, adaptive synthetic oversampling (ADASYN), and SMOTE-Adaboost. To evaluate the performances of sampling approaches, models are constructed using four classifiers (K-nearest neighbour, decision tree, naive Bayes, logistic regression) on balanced and imbalanced datasets. The study shows that the results of SMOTE-M are comparable to that of ADASYN and SMOTE-Adaboost.

引用

页码：15 / 37

页数：23

共 50 条

[41] Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
Luengo, Julian
Fernandez, Alberto
Garcia, Salvador
Herrera, Francisco
[J]. SOFT COMPUTING, 2011, 15 (10) : 1909 - 1936
[42] Lumos: Identifying and Localizing Diverse Hidden IoT Devices in an Unfamiliar Environment
Sharma, Rahul Anand
Soltanaghaei, Elahe
Rowe, Anthony
Sekar, Vyas
[J]. PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 1095 - 1112
[43] Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models
Dube, Lindani
Verster, Tanja
[J]. DATA SCIENCE IN FINANCE AND ECONOMICS, 2023, 3 (04): : 354 - 379
[44] Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review
Lenka, Sudhansu R.
Bisoy, Sukant Kishoro
Priyadarshini, Rojalina
Sain, Mangal
[J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[45] Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning
Habbat, Nassera
Nouri, Hicham
Anoun, Houda
Hassouni, Larbi
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[46] A Noisy-sample-removed Under-sampling Scheme for Imbalanced Classification of Public Datasets
Zhu, Honghao
Liu, Guanjun
Zhou, Mengchu
Xie, Yu
Kang, Qi
[J]. IFAC PAPERSONLINE, 2020, 53 (05): : 624 - 629
[47] Modified lightweight cryptography scheme and its applications in IoT environment
Yasmin N.
Gupta R.
[J]. International Journal of Information Technology, 2023, 15 (8) : 4403 - 4414
[48] A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets
Wong, Ginny Y.
Leung, Frank H. F.
Ling, Sai-Ho
[J]. 39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 2354 - 2359
[49] Analysis on Multi-Platform Intelligent Control System based on FPGA under IOT Environment
Long, Chaoping
Lu, Jianhua
[J]. 3RD INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND TECHNOLOGY EDUCATION (ICSSTE 2017), 2017, : 716 - 722
[50] Toward the integration of datasets in the CRIS environment: A preliminary analysis
Luzi, Daniela
Di Cesare, Rosa
Ruggieri, Roberta
[J]. E-INFRASTRUCTURES FOR RESEARCH AND INNOVATION: LINKING INFORMATION SYSTEMS TO IMPROVE SCIENTIFIC KNOWLEDGE PRODUCTION, 2012, : 73 - 82

← 1 2 3 4 5 →