A new data complexity measure for multi-class imbalanced classification tasks

被引:0
|
作者
Han, Mingming [1 ]
Guo, Husheng [1 ,2 ]
Wang, Wenjian [1 ,2 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat P, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
关键词
Data characteristic; Skewed distribution; Correlation; Multi-class;
D O I
10.1016/j.patcog.2024.110881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The skewed class distribution and data complexity may severely affect the imbalanced classification results. The cost of classification can be significantly reduced if these data complexity are measured and pre-processed prior to training, particularly when dealing with large-scale and high-dimensional datasets. Although many methods have been proposed to evaluate data complexity, most of them fail to fully consider the interaction among different data characteristics, or the connection between class imbalance and these characteristics, thus posing a serious challenge to effectively evaluate the difficulty of classification. This paper presents a new data complexity measure MFII (multi-factor imbalance index), which measures the combined effects of the skewed class distribution and data characteristics on classification difficulty. In particular, it further comprehensively investigates the impact of overlap size, confusion degree, and sub-cluster structure. VoR (value of resolution) and DoC (degree of consistency) are also proposed to evaluate the resolution and interpretability of complexity measures. The experimental results demonstrate that MFII has lower VoR and a stronger correlation with classification metrics, which indicates that MFII can more accurately evaluate the difficulty of multi-class imbalanced classification tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A Dynamic Sampling Framework for Multi-Class Imbalanced Data
    Debowski, B.
    Areibi, S.
    Grewal, G.
    Tempelman, J.
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 113 - 118
  • [32] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    [J]. 2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [33] Deep Spatio-Temporal Representation Learning for Multi-Class Imbalanced Data Classification
    Pouyanfar, Samira
    Chen, Shu-Ching
    Shyu, Mei-Ling
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 386 - 393
  • [34] Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
    Yao, Leehter
    Lin, Tung-Bin
    [J]. SENSORS, 2021, 21 (19)
  • [35] Multi-class imbalanced image classification using conditioned GANs
    Kumar, M. R. Pavan
    Jayagopal, Prabhu
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2021, 10 (03) : 143 - 153
  • [36] Plankton Image Classification via Multi-class Imbalanced Learning
    Ding, Hao
    Wei, Bin
    Tang, Ning
    Yu, Zhibin
    Wang, Nan
    Zheng, Haiyong
    Zheng, Bing
    [J]. 2018 OCEANS - MTS/IEEE KOBE TECHNO-OCEANS (OTO), 2018,
  • [37] Multi-class imbalanced image classification using conditioned GANs
    M R Pavan Kumar
    Prabhu Jayagopal
    [J]. International Journal of Multimedia Information Retrieval, 2021, 10 : 143 - 153
  • [38] Comparative Analysis using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification
    Riyanto, Slamet
    Sitanggang, Imas Sukaesih
    Djatna, Taufik
    Atikah, Tika Dewi
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1082 - 1090
  • [39] SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling
    Agrawal, Astha
    Viktor, Herna L.
    Paquet, Eric
    [J]. 2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 226 - 233
  • [40] Enhancing Classification Performance of Multi-Class Imbalanced Data Using the OAA-DB Algorithm
    Jeatrakul, Piyasak
    Wong, Kok Wai
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,