Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification

被引:2
|
作者
Vairetti, Carla [1 ,3 ]
Assadi, Jose Luis
Maldonado, Sebastian [2 ,3 ]
机构
[1] Univ Los Andes, Fac Ingn & Ciencias Aplicadas, Los Andes, Chile
[2] Univ Chile, Sch Econ & Business, Dept Management Control & Informat Syst, Santiago, Chile
[3] Inst Sistemas Complejos Ingenieri ISCI, Santiago, Chile
关键词
Imbalanced classification; SMOTE; Big data; Intelligent undersampling; MapReduce; SMOTE; MAPREDUCE; OUTCOMES; MACHINE; INSIGHT;
D O I
10.1016/j.eswa.2024.123149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced classification is a well-known challenge faced by many real -world applications. This issue occurs when the distribution of the target variable is skewed, leading to a prediction bias toward the majority class. With the arrival of the Big Data era, there is a pressing need for efficient solutions to solve this problem. In this work, we present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework. Both procedures are performed on the same pass over the data, conferring efficiency to the technique. The SMOTENN method is complemented with an efficient implementation of the neighborhoods related to the minority samples. Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets while achieving positive results on large datasets with reduced running times.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Evolutionary Undersampling for Imbalanced Big Data Classification
    Triguero, I.
    Galar, M.
    Vluymans, S.
    Cornelis, C.
    Bustince, H.
    Herrera, F.
    Saeys, Y.
    [J]. 2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
  • [2] Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data
    Alamri, Maram
    Ykhlef, Mourad
    [J]. IEEE ACCESS, 2024, 12 : 14050 - 14060
  • [3] CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
    Koziarski, Michal
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining
    Wongvorachan, Tarid
    He, Surina
    Bulut, Okan
    [J]. INFORMATION, 2023, 14 (01)
  • [5] Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark
    Triguero, I.
    Galar, M.
    Merino, D.
    Maillo, J.
    Bustince, H.
    Herrera, F.
    [J]. 2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 640 - 647
  • [6] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    [J]. INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [7] Analysis of Data Preprocessing Increasing the Oversampling Ratio for Extremely Imbalanced Big Data Classification
    del Rio, Sara
    Benitez, Jose M.
    Herrera, Francisco
    [J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 180 - 185
  • [8] A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
    Feng, Fang
    Li, Kuan-Ching
    Yang, Erfu
    Zhou, Qingguo
    Han, Lihong
    Hussain, Amir
    Cai, Mingjiang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3231 - 3267
  • [9] A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
    Fang Feng
    Kuan-Ching Li
    Erfu Yang
    Qingguo Zhou
    Lihong Han
    Amir Hussain
    Mingjiang Cai
    [J]. Multimedia Tools and Applications, 2023, 82 : 3231 - 3267
  • [10] Combination of Oversampling and Undersampling Techniques on Imbalanced Datasets
    Bansal, Ankita
    Verma, Ayush
    Singh, Sarabjot
    Jain, Yashonam
    [J]. INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 3, 2023, 492 : 647 - 656