Imbalanced data classification using improved synthetic minority over-sampling technique

被引:0
|
作者
Anusha, Yamijala [1 ]
Visalakshi, R. [2 ]
Srinivas, Konda [3 ]
机构
[1] Annamalai Univ, Dept Comp Sci & Engn, Chidambaram 608002, India
[2] Annamalai Univ, Dept Informat Technol, Chidambaram, India
[3] CMR Tech Campus, Dept Comp Sci & Engn Data Sci, Hyderabad, India
关键词
Data cleaning; imbalanced data; min max normalization; random forest; synthetic minority oversampling technique; transformation; FEATURE-SELECTION; BIG; OPTIMIZATION; PREDICTION; ALGORITHM;
D O I
10.3233/MGS-230007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In data mining, deep learning and machine learning models face class imbalance problems, which result in a lower detection rate for minority class samples. An improved Synthetic Minority Over-sampling Technique (SMOTE) is introduced for effective imbalanced data classification. After collecting the raw data from PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases, the pre-processing is performed using min-max normalization, cleaning, integration, and data transformation techniques to achieve data with better uniqueness, consistency, completeness and validity. An improved SMOTE algorithm is applied to the pre-processed data for proper data distribution, and then the properly distributed data is fed to the machine learning classifiers: Support Vector Machine (SVM), Random Forest, and Decision Tree for data classification. Experimental examination confirmed that the improved SMOTE algorithm with random forest attained significant classification results with Area under Curve (AUC) of 94.30%, 91%, 96.40%, and 99.40% on the PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases.
引用
收藏
页码:117 / 131
页数:15
相关论文
共 50 条
  • [1] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    [J]. PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [2] Classification of imbalanced PubChem BioAssay data using an efficient algorithm coupled with synthetic minority over-sampling technique
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [3] Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data
    Feng, Wei
    Dauphin, Gabriel
    Huang, Wenjiang
    Quan, Yinghui
    Bao, Wenxing
    Wu, Mingquan
    Li, Qiang
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (07) : 2159 - 2169
  • [4] Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification
    Alex, Suja A.
    Nayahi, J. Jesu Vedha
    Kaddoura, Sanaa
    [J]. APPLIED SOFT COMPUTING, 2024, 156
  • [5] AN IMBALANCED SIGNAL MODULATION CLASSIFICATION AND EVALUATION METHOD BASED ON SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE
    Liu, Xuebo
    Wang, Yiran
    Bai, Jing
    Li, Haoran
    Wang, Xu
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6224 - 6227
  • [6] Classification of Advertisement Text on Facebook Using Synthetic Minority Over-Sampling Technique
    Akkaradamrongrat, Suphamongkol
    Kachamas, Pornpimon
    Sinthupinyo, Sukree
    [J]. 2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [7] A self-adaptive synthetic over-sampling technique for imbalanced classification
    Gu, Xiaowei
    Angelov, Plamen P.
    Soares, Eduardo A.
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (06) : 923 - 943
  • [8] A sparrow search algorithm-optimized convolutional neural network for imbalanced data classification using synthetic minority over-sampling technique
    Deng, Wu
    He, Qi
    Zhou, Xiangbing
    Chen, Huayue
    Zhao, Huimin
    [J]. PHYSICA SCRIPTA, 2023, 98 (11)
  • [9] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [10] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    [J]. 2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,