A new data complexity measure for multi-class imbalanced classification tasks

被引:0
|
作者
Han, Mingming [1 ]
Guo, Husheng [1 ,2 ]
Wang, Wenjian [1 ,2 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat P, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
关键词
Data characteristic; Skewed distribution; Correlation; Multi-class;
D O I
10.1016/j.patcog.2024.110881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The skewed class distribution and data complexity may severely affect the imbalanced classification results. The cost of classification can be significantly reduced if these data complexity are measured and pre-processed prior to training, particularly when dealing with large-scale and high-dimensional datasets. Although many methods have been proposed to evaluate data complexity, most of them fail to fully consider the interaction among different data characteristics, or the connection between class imbalance and these characteristics, thus posing a serious challenge to effectively evaluate the difficulty of classification. This paper presents a new data complexity measure MFII (multi-factor imbalance index), which measures the combined effects of the skewed class distribution and data characteristics on classification difficulty. In particular, it further comprehensively investigates the impact of overlap size, confusion degree, and sub-cluster structure. VoR (value of resolution) and DoC (degree of consistency) are also proposed to evaluate the resolution and interpretability of complexity measures. The experimental results demonstrate that MFII has lower VoR and a stronger correlation with classification metrics, which indicates that MFII can more accurately evaluate the difficulty of multi-class imbalanced classification tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] F-Measure Optimization for Multi-class, Imbalanced Emotion Classification Tasks
    Inan, Toki Tahmid
    Liu, Mingrui
    Shehu, Amarda
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 158 - 170
  • [2] A survey of multi-class imbalanced data classification methods
    Han, Meng
    Li, Ang
    Gao, Zhihui
    Mu, Dongliang
    Liu, Shujuan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2471 - 2501
  • [3] Multi-class imbalanced big data classification on Spark
    Sleeman, William C.
    Krawczyk, Bartosz
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [4] A Combination Method for Multi-Class Imbalanced Data Classification
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 365 - 368
  • [5] Selecting local ensembles for multi-class imbalanced data classification
    Krawczyk, Bartosz
    Cano, Alberto
    Wozniak, Michal
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [6] Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification
    Krawczyk, Bartosz
    Bellinger, Colin
    Corizzo, Roberto
    Japkowicz, Nathalie
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [7] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [8] Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Zhang, Yibo
    Zhang, Ruifeng
    Chen, Si
    [J]. INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 599 - 614
  • [9] Data Complexity Measures for Imbalanced Classification Tasks
    Barella, Victor H.
    Garcia, Luis P. F.
    de Souto, Marcilio P.
    Lorena, Ana C.
    de Carvalho, Andre
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [10] Boosting methods for multi-class imbalanced data classification: an experimental review
    Jafar Tanha
    Yousef Abdi
    Negin Samadi
    Nazila Razzaghi
    Mohammad Asadpour
    [J]. Journal of Big Data, 7