Solving the class imbalance problem using a counterfactual method for data augmentation

被引:16
|
作者
Temraz, Mohammed [1 ,2 ]
Keane, Mark T. [1 ,2 ,3 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Dublin 4, Ireland
[2] Univ Coll Dublin, Insight Ctr Data Analyt, Dublin 4, Ireland
[3] Univ Coll Dublin, VistaMilk SFI Res Ctr, Dublin 4, Ireland
来源
基金
爱尔兰科学基金会;
关键词
Counterfactual; Class imbalance problem; Data augmentation; XAI; BORDERLINE-SMOTE; SAMPLING METHOD; CLASSIFICATION; EXPLANATIONS; ALGORITHM;
D O I
10.1016/j.mlwa.2022.100375
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from class imbalanced datasets poses challenges for many machine learning algorithms. Many realworld domains are, by definition, class imbalanced by virtue of having a majority class that naturally has many more instances than its minority class (e.g., genuine bank transactions occur much more often than fraudulent ones). Many methods have been proposed to solve the class imbalance problem, among the most popular being oversampling techniques (such as SMOTE). These methods generate synthetic instances in the minority class, to balance the dataset, performing data augmentations that improve the performance of predictive machine learning (ML). In this paper, we advance a novel, data augmentation method (adapted from eXplainable AI), that generates synthetic, counterfactual instances in the minority class. Unlike other oversampling techniques, this method adaptively combines existing instances from the dataset, using actual feature -values rather than interpolating values between instances. Several experiments using four different classifiers and 25 datasets involving binary classes are reported, which show that this Counterfactual Augmentation (CFA) method generates useful synthetic datapoints in the minority class. The experiments also show that CFA is competitive with many other oversampling methods, many of which are variants of SMOTE. The basis for CFA's performance is discussed, along with the conditions under which it is likely to perform better or worse in future tests.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] EEG data augmentation: towards class imbalance problem in sleep staging tasks
    Fan, Jiahao
    Sun, Chenglu
    Chen, Chen
    Jiang, Xinyu
    Liu, Xiangyu
    Zhao, Xian
    Meng, Long
    Dai, Chenyun
    Chen, Wei
    [J]. JOURNAL OF NEURAL ENGINEERING, 2020, 17 (05)
  • [2] A novel data augmentation approach to fault diagnosis with class-imbalance problem
    Tian, Jilun
    Jiang, Yuchen
    Zhang, Jiusi
    Luo, Hao
    Yin, Shen
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 243
  • [3] Generative adversarial network augmentation for solving the training data imbalance problem in crop classification
    Shumilo, Leonid
    Okhrimenko, Anton
    Kussul, Nataliia
    Drozd, Sofiia
    Shkalikov, Oleh
    [J]. REMOTE SENSING LETTERS, 2023, 14 (11) : 1131 - 1140
  • [4] THE METHODS FOR QUANTITATIVE SOLVING THE CLASS IMBALANCE PROBLEM
    Kavrin, D. A.
    Subbotin, S. A.
    [J]. RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2018, (01) : 83 - 90
  • [5] A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    [J]. ADVANCES IN DATA SCIENCE AND ADAPTIVE ANALYSIS, 2021, 13 (02)
  • [6] A learning method for the class imbalance problem with medical data sets
    Li, Der-Chiang
    Liu, Chiao-Wen
    Hu, Susan C.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2010, 40 (05) : 509 - 518
  • [7] Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection
    Liu, Lijue
    Wu, Xiaoyu
    Li, Shihao
    Li, Yi
    Tan, Shiyang
    Bai, Yongping
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [8] Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection
    Lijue Liu
    Xiaoyu Wu
    Shihao Li
    Yi Li
    Shiyang Tan
    Yongping Bai
    [J]. BMC Medical Informatics and Decision Making, 22
  • [9] Improving classification of mature microRNA by solving class imbalance problem
    Wang, Ying
    Li, Xiaoye
    Tao, Bairui
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [10] Improving classification of mature microRNA by solving class imbalance problem
    Ying Wang
    Xiaoye Li
    Bairui Tao
    [J]. Scientific Reports, 6