Data Augmentation Meta-Classifier Scheme for imbalanced data sets

被引:0
|
作者
Moreno-Barea, Francisco J. [1 ]
Jerez, Jose M. [1 ]
Franco, Leonardo [1 ]
机构
[1] Univ Malaga, Dept Lenguajes & Ciencias Computac, Escuela Tecn Super Ingn Informat, Malaga, Spain
关键词
Data augmentation; Imbalance learning; Data Mining; Meta-Classifier; SMOTE;
D O I
10.1109/SSCI51031.2022.10022209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorical data obtained from real-world domains are commonly imbalanced, as they often present more number of samples belonging to one of the classes. Imbalanced data tends to be a problem for classifiers, as the majority class biased them and affects overall performance. Among the techniques used for dealing with imbalanced data sets, data augmentation (DA) constitutes an alternative, as it can improve the accuracy of prediction for the minority class (usually the relevant one), but usually at the cost of a loss regarding predictions of the majority one. To benefit from both behaviours, we introduce in this study a meta-classifier scheme that works as a mixture of two classifiers, one trained with the original data and the second one trained using augmented data. The experiments carried out with 12 imbalanced data sets, 5 of them obtained from the TCGA database related to cancer survival prediction, show an improvement in accuracy, area under the ROC curve and Matthews correlation coefficient values compared to the results obtained using the original data sets.
引用
下载
收藏
页码:1392 / 1399
页数:8
相关论文
共 50 条
  • [1] Simultaneous Meta-Data and Meta-Classifier Selection in Multiple Classifier System
    Tien Thanh Nguyen
    Anh Vu Luong
    Thi Minh Van Nguyen
    Trong Sy Ha
    Liew, Alan Wee-Chung
    McCall, John
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, : 39 - 46
  • [2] Data Augmentation Classifier for Imbalanced Fault Classification
    Jiang, Xiaoyu
    Ge, Zhiqiang
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (03) : 1206 - 1217
  • [3] A Voronoi Diagram Based Classifier for Multiclass Imbalanced Data Sets
    Silva, Evandro J. R.
    Zanchettin, Cleber
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 109 - 114
  • [4] An Adaptive Sampling Ensemble Classifier for Learning from Imbalanced Data Sets
    Geiler, Ordonez Jon
    Hong, Li
    Yue-Jian, Guo
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 513 - 517
  • [5] Data Mining on Imbalanced Data Sets
    Gu, Qiong
    Cai, Zhihua
    Zhu, Li
    Huang, Bo
    2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, : 1020 - 1024
  • [6] Time series data augmentation classifier for industrial process imbalanced fault diagnosis
    Shen, Bingbing
    Yao, Le
    Jiang, Xiaoyu
    Yang, Zeyu
    Zeng, Jiusun
    2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS, 2023, : 1392 - 1397
  • [7] Adapted pruning scheme for the framework of imbalanced data-sets
    Chaabane, Ikram
    Guermazi, Radhouane
    Hammami, Mohamed
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1542 - 1553
  • [8] Using hybrid associative classifier with translation (HACT) for studying imbalanced data sets
    Cleofas Sanchez, Laura
    Guzman Escobedo, M.
    Valdovinos Rosas, Rosa Maria
    Yanez Marquez, Cornelio
    Camacho Nieto, Oscar
    INGENIERIA E INVESTIGACION, 2012, 32 (01): : 53 - 57
  • [9] A kernel-based two-class classifier for imbalanced data sets
    Hong, Xia
    Chen, Sheng
    Harris, Chris J.
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (01): : 28 - 41
  • [10] Customer purchase prediction in electronic markets from clickstream data using the Oracle meta-classifier
    Fatemeh Ehsani
    Monireh Hosseini
    Operational Research, 2024, 24