Anomaly detection-based undersampling for imbalanced classification problems

被引:0
|
作者
Park, You-Jin [1 ]
Brito, Paula [2 ,3 ]
Ma, Yun-Chen [1 ]
机构
[1] Natl Taipei Univ Technol, Dept Ind Engn & Management, Taipei City, Taiwan
[2] Univ Porto, Fac Econ, Porto, Portugal
[3] INESC TEC, LIAAD, Porto, Portugal
关键词
Machine learning; classification; class imbalance; anomaly; undersampling; SMOTE; NOISY;
D O I
10.1080/0305215X.2024.2315501
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In various machine learning applications, classification plays an important role in categorizing and predicting data. To improve the classification performance, it is crucial to identify and remove the anomalies. Also, class imbalance in many machine learning applications is a very common problem since most classifiers tend to be biased toward the majority class by ignoring the minority class instances. Thus, in this research, we propose a new under-sampling technique based on anomaly detection and removal to enhance the performance of imbalanced classification problems. To demonstrate the effectiveness of the proposed method, comprehensive experiments are conducted on forty imbalanced data sets and two non-parametric hypothesis tests are employed to show the statistical difference in classification performances between the proposed method and other traditional resampling methods. From the experiment, it is shown that the proposed method improves the classification performance by effectively detecting and eliminating the anomalies among true-majority or pseudo-majority class instances.
引用
收藏
页码:2565 / 2578
页数:14
相关论文
共 50 条
  • [21] Partial Undersampling of Imbalanced Data for Cyber Threats Detection
    Moniruzzaman, Md
    Bagirov, A. M.
    Gondal, Iqbal
    PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2020), 2020,
  • [22] An approach for classification of highly imbalanced data using weighting and undersampling
    Ashish Anand
    Ganesan Pugalenthi
    Gary B. Fogel
    P. N. Suganthan
    Amino Acids, 2010, 39 : 1385 - 1391
  • [23] Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap
    Sun, Peiqi
    Du, Yanhui
    Xiong, Siyun
    NEUROCOMPUTING, 2024, 609
  • [24] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    APPLIED SOFT COMPUTING, 2021, 101
  • [25] Towards fuzzy anomaly detection-based security: a comprehensive review
    Masdari, Mohammad
    Khezri, Hemn
    FUZZY OPTIMIZATION AND DECISION MAKING, 2021, 20 (01) : 1 - 49
  • [26] A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification
    Guan, Hongjiao
    Zhang, Yingtao
    Ma, Bin
    Li, Jian
    Wang, Chunpeng
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [27] Imbalanced Data Classification Based on MBCDK-means Undersampling and GA-ANN
    Song, Anping
    Xu, Quanhua
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT II, 2018, 11140 : 349 - 358
  • [28] Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
    Krawczyk, Bartosz
    Galar, Mikel
    Jelen, Lukasz
    Herrera, Francisco
    APPLIED SOFT COMPUTING, 2016, 38 : 714 - 726
  • [29] An approach for classification of highly imbalanced data using weighting and undersampling
    Anand, Ashish
    Pugalenthi, Ganesan
    Fogel, Gary B.
    Suganthan, P. N.
    AMINO ACIDS, 2010, 39 (05) : 1385 - 1391
  • [30] GUM: A Guided Undersampling Method to Preprocess Imbalanced Datasets for Classification
    Sung, Kisuk
    Brown, W. Eric
    Moreno-Centeno, Erick
    Ding, Yu
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 1086 - 1091