Data Entropy-Based Imbalanced Learning

被引:0
|
作者
Fan, Yutao [1 ,2 ,3 ,4 ]
Huang, Heming [1 ,2 ,3 ]
机构
[1] Qinghai Normal Univ, Xining 810008, Peoples R China
[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China
[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China
[4] North China Inst Sci & Technol, Beijing 065201, Peoples R China
基金
中国国家自然科学基金;
关键词
data entropy; deep learning; imbalanced learning; NETWORKS;
D O I
10.1007/978-3-031-67871-4_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
All the time the skewness of observations is thought as the reason of poor classification performance, especially the bias in classification performance among classes in machine learning. However, our recent study challenges this notion. We argue that the bias of classification performance comes from the imbalance of information of classes rather than just that of observations. To reflect the information imbalance of classes, we propose an indicator data entropy that captures the randomness within classes. A dataset with balanced and higher data entropies across its classes is more likely to exhibit improved classification performance. Furthermore, we propose another indicator data mutual information that quantifies the similarity between classes. Higher values indicates that the models can leverage learning from classes to enhance learning capacity. Therefore, reducing the difference in data entropy between classes and enhancing data mutual information concurrently is advantageous for classification. Our experiments, conducted across four models SVM, CNN, Transformer (including its variants ViT), and DNN, on datasets CIFAR-10, Airline Satisfaction, Smoking Body Signal and Liver Cirrhosis, validate the efficacy of our proposed indicators. Through rebalancing the data entropy distribution among classes and increasing the data entropy within classes as well as the data mutual information in the Liver Cirrhosis dataset using resampling techniques, we observe classification enhancements measured in d-index across four models.
引用
收藏
页码:95 / 109
页数:15
相关论文
共 50 条
  • [41] Unified entropy-based sorting for reversible data hiding
    Jiajia Xu
    Weiming Zhang
    Ruiqi Jiang
    Nenghai Yu
    Multimedia Tools and Applications, 2017, 76 : 3829 - 3850
  • [42] Seismic Data Interpolation by Shannon Entropy-Based Shaping
    Huang, Weilin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [43] Online entropy-based discretization for data streaming classification
    Ramirez-Gallego, S.
    Garcia, S.
    Herrera, F.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 59 - 70
  • [44] Multiscale entropy-based analysis of soil transect data
    Tarquis, A. M.
    Bird, N. R. A.
    Whitmore, A. P.
    Cartagena, M. C.
    Pachepsky, Yakov
    VADOSE ZONE JOURNAL, 2008, 7 (02) : 563 - 569
  • [45] A First Look at Information Entropy-Based Data Pricing
    Li, Xijun
    Yao, Jianguo
    Liu, Xue
    Guan, Haibing
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 2053 - 2060
  • [46] Entropy-Based Robust Fuzzy Clustering of Relational Data
    Mei Jian-Ping
    Chen Li-Hui
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 385 - 390
  • [47] ENTROPY-BASED ANALYSIS OF CHIP-SEQUENCING DATA
    Zare, Hossein
    Kaveh, Mostafa
    Khodursky, Arkady B.
    2009 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2009), 2009, : 88 - +
  • [48] Increasing Yields with Entropy-based Analysis of Test Data
    Engler, Joseph
    2010 IEEE AUTOTESTCON, 2010, : 138 - 143
  • [49] Relative Entropy-Based Similarity for Patterns in Graph Data
    Liu, Shihu
    Deng, Li
    Gao, Haiyan
    Ma, Xueyu
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [50] An entropy-based subspace clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 272 - 277