Data Entropy-Based Imbalanced Learning

被引:0
|
作者
Fan, Yutao [1 ,2 ,3 ,4 ]
Huang, Heming [1 ,2 ,3 ]
机构
[1] Qinghai Normal Univ, Xining 810008, Peoples R China
[2] State Key Lab Tibetan Intelligent Informat Proc &, Xining 810008, Peoples R China
[3] Minist Educ, Key Lab Tibetan Informat Proc, Xining 810008, Peoples R China
[4] North China Inst Sci & Technol, Beijing 065201, Peoples R China
基金
中国国家自然科学基金;
关键词
data entropy; deep learning; imbalanced learning; NETWORKS;
D O I
10.1007/978-3-031-67871-4_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
All the time the skewness of observations is thought as the reason of poor classification performance, especially the bias in classification performance among classes in machine learning. However, our recent study challenges this notion. We argue that the bias of classification performance comes from the imbalance of information of classes rather than just that of observations. To reflect the information imbalance of classes, we propose an indicator data entropy that captures the randomness within classes. A dataset with balanced and higher data entropies across its classes is more likely to exhibit improved classification performance. Furthermore, we propose another indicator data mutual information that quantifies the similarity between classes. Higher values indicates that the models can leverage learning from classes to enhance learning capacity. Therefore, reducing the difference in data entropy between classes and enhancing data mutual information concurrently is advantageous for classification. Our experiments, conducted across four models SVM, CNN, Transformer (including its variants ViT), and DNN, on datasets CIFAR-10, Airline Satisfaction, Smoking Body Signal and Liver Cirrhosis, validate the efficacy of our proposed indicators. Through rebalancing the data entropy distribution among classes and increasing the data entropy within classes as well as the data mutual information in the Liver Cirrhosis dataset using resampling techniques, we observe classification enhancements measured in d-index across four models.
引用
收藏
页码:95 / 109
页数:15
相关论文
共 50 条
  • [21] Entropy-based closure for probabilistic learning on manifolds
    Soize, C.
    Ghanem, R.
    Safta, C.
    Huan, X.
    Vane, Z. P.
    Oefelein, J.
    Lacaze, G.
    Najm, H. N.
    Tang, Q.
    Chen, X.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2019, 388 : 518 - 533
  • [22] Entropy-Based Active Learning for Object Recognition
    Holub, Alex
    Perona, Pietro
    Burl, Michael C.
    2008 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, VOLS 1-3, 2008, : 885 - +
  • [23] An Improved Entropy-Based Multiple Kernel Learning
    Hino, Hideitsu
    Ogawa, Tetsuji
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1189 - 1192
  • [24] ImbTreeEntropy: An R package for building entropy-based classification trees on imbalanced datasets
    Gajowniczek, Krzysztof
    Zabkowski, Tomasz
    SOFTWAREX, 2021, 16
  • [25] Entropy-Based Statistical Analysis of PolSAR Data
    Frery, Alejandro C.
    Cintra, Renato J.
    Nascimento, Abraao D. C.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2013, 51 (06): : 3733 - 3743
  • [26] Entropy-based method to evaluate the data integrity
    Xu Peng
    Ma Tianyu
    Jin Yongjie
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2006, 569 (02): : 412 - 415
  • [27] Entropy-based discretization methods for ranking data
    de Sa, Claudio Rebelo
    Soares, Carlos
    Knobbe, Arno
    INFORMATION SCIENCES, 2016, 329 : 921 - 936
  • [28] Entropy-Based Mixed data transform model
    Liu, Xingxing
    Chen, Shan
    Wang, Pan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 123 - 126
  • [29] Entropy-Based Subsampling Methods for Big Data
    Sui, Qun
    Ghosh, Sujit K.
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2024, 18 (02)
  • [30] Entropy-based method for COP data analysis
    Jiang, Bernard C.
    Yang, W. -H.
    Shieh, J. -S.
    Fan, J. S. -Z.
    Peng, C. -K.
    THEORETICAL ISSUES IN ERGONOMICS SCIENCE, 2013, 14 (03) : 227 - 246