Anomaly detection-based undersampling for imbalanced classification problems

被引:0
|
作者
Park, You-Jin [1 ]
Brito, Paula [2 ,3 ]
Ma, Yun-Chen [1 ]
机构
[1] Natl Taipei Univ Technol, Dept Ind Engn & Management, Taipei City, Taiwan
[2] Univ Porto, Fac Econ, Porto, Portugal
[3] INESC TEC, LIAAD, Porto, Portugal
关键词
Machine learning; classification; class imbalance; anomaly; undersampling; SMOTE; NOISY;
D O I
10.1080/0305215X.2024.2315501
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In various machine learning applications, classification plays an important role in categorizing and predicting data. To improve the classification performance, it is crucial to identify and remove the anomalies. Also, class imbalance in many machine learning applications is a very common problem since most classifiers tend to be biased toward the majority class by ignoring the minority class instances. Thus, in this research, we propose a new under-sampling technique based on anomaly detection and removal to enhance the performance of imbalanced classification problems. To demonstrate the effectiveness of the proposed method, comprehensive experiments are conducted on forty imbalanced data sets and two non-parametric hypothesis tests are employed to show the statistical difference in classification performances between the proposed method and other traditional resampling methods. From the experiment, it is shown that the proposed method improves the classification performance by effectively detecting and eliminating the anomalies among true-majority or pseudo-majority class instances.
引用
收藏
页码:2565 / 2578
页数:14
相关论文
共 50 条
  • [41] CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
    Koziarski, Michal
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [42] A Neighborhood Undersampling Stacked Ensemble (NUS-SE) in imbalanced classification
    Seng, Zian
    Kareem, Sameem Abdul
    Varathan, Kasturi Dewi
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [43] Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification
    Vairetti, Carla
    Assadi, Jose Luis
    Maldonado, Sebastian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246
  • [44] A Membership Probability–Based Undersampling Algorithm for Imbalanced Data
    Gilseung Ahn
    You-Jin Park
    Sun Hur
    Journal of Classification, 2021, 38 : 2 - 15
  • [45] MLP-Based Undersampling Technique for Imbalanced Learning
    Babar, Varsha
    Ade, Roshani
    2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 142 - 147
  • [46] Grid-Based and Outlier Detection-Based Data Clustering and Classification
    Cho, Kyu Cheol
    Lee, Jong Sik
    UBIQUITOUS COMPUTING AND MULTIMEDIA APPLICATIONS, PT I, 2011, 150 : 129 - 138
  • [47] AN ENSEMBLE ANOMALY DETECTION WITH IMBALANCED DATA BASED ON ROBOT VISION
    Wang, Yongxiong
    Sun, Shuxin
    Zhong, Jiandong
    INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2016, 31 (02): : 77 - 83
  • [48] SVM CLASSIFICATION BASED ON THE IMBALANCED DATASETS FOR PROBLEMS OF PSYCHODIAGNOSTICS
    Demidova, Liliya
    Klyueva, Irina
    Pylkin, Alexander
    ICPE 2017: INTERNATIONAL CONFERENCE ON PSYCHOLOGY AND EDUCATION, 2017, 33 : 95 - 103
  • [49] Grid-based & Outlier Detection-based Data Clustering & Classification
    Cho, Kyu Cheol
    Lee, Jong Sik
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (03): : 1253 - 1266
  • [50] NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection
    Zhu, Honghao
    Zhou, MengChu
    Liu, Guanjun
    Xie, Yu
    Liu, Shijun
    Guo, Cheng
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 1793 - 1804