A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data

被引:0
|
作者
Ren, Weishuo [1 ,2 ]
Zheng, Yifeng [1 ,2 ]
Zhang, Wenjie [1 ,2 ]
Qing, Depeng [1 ,2 ]
Zeng, Xianlong [1 ,2 ]
Li, Guohe [3 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Fujian, Peoples R China
[2] Fujian Prov Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China
[3] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China
关键词
Multi-label classification; Imbalanced data; Over-sampling approach; Chebyshev inequality; Group optimization strategy; CLASSIFICATION;
D O I
10.1016/j.neucom.2024.128717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of intelligent technology, data exhibits characteristics of multi-label and imbalanced distribution, which lead to the degradation of classification model performance. Therefore, addressing multi- label class imbalance has become a hot research topic. Nowadays, over-sampling approaches aim to generate a superset of the original dataset to deal with imbalanced data. However, traditional over-sampling methods only employ the central data point and its nearest neighbor samples to synthesize samples without considering the impact of data distribution. To address these issues, in this paper, we propose an ensemble multi- label over-sampling algorithm (MLCIO) based on Chebyshev inequality and a group optimization strategy. Firstly, to generate more representative and diverse samples, with the seed sample serving as the sphere's center, Chebyshev inequality is utilized to ensure that synthetic samples fall within its m times the standard deviation. Secondly, a group optimization ranking weighting approach is employed to obtain more reliable and stable label information. Finally, comparative experiments are conducted on 11 imbalanced datasets from various domains using different evaluation metrics. The results demonstrate that our proposal achieves better performance than other approaches.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Feature selection and its combination with data over-sampling for multi-class imbalanced datasets
    Tsai, Chih-Fong
    Chen, Kuan-Chen
    Lin, Wei -Chao
    APPLIED SOFT COMPUTING, 2024, 153
  • [32] AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems
    Yang, Xuebing
    Kuang, Qiuming
    Zhang, Wensheng
    Zhang, Guoping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (09) : 1672 - 1685
  • [33] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [34] An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy
    Chen, Wei
    Guo, Wenjie
    Mao, Weijie
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11430 - 11449
  • [35] Affine combination-based over-sampling for imbalanced regression
    Li, Zhen-Zhen
    Huang, Niu
    Yi, Lun-Zhao
    Fu, Guang-Hui
    JOURNAL OF CHEMOMETRICS, 2024, 38 (03)
  • [36] A New Over-Sampling Approach: Random-SMOTE for Learning from Imbalanced Data Sets
    Dong, Yanjie
    Wang, Xuehua
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2011, 7091 : 343 - 352
  • [37] Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams
    Czarnowski, Ireneusz
    JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
  • [39] Diversity and Separable Metrics in Over-Sampling Technique for Imbalanced Data Classification
    Mahmoudi, Shadi
    Moradi, Parham
    Akhlaghian, Fardin
    Moradi, Rizan
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 152 - 158
  • [40] Learning from Imbalanced Data Using Over-Sampling and the Firefly Algorithm
    Czarnowski, Ireneusz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 12876 : 373 - 386