A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data

被引:0
|
作者
Ren, Weishuo [1 ,2 ]
Zheng, Yifeng [1 ,2 ]
Zhang, Wenjie [1 ,2 ]
Qing, Depeng [1 ,2 ]
Zeng, Xianlong [1 ,2 ]
Li, Guohe [3 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Fujian, Peoples R China
[2] Fujian Prov Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China
[3] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China
关键词
Multi-label classification; Imbalanced data; Over-sampling approach; Chebyshev inequality; Group optimization strategy; CLASSIFICATION;
D O I
10.1016/j.neucom.2024.128717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of intelligent technology, data exhibits characteristics of multi-label and imbalanced distribution, which lead to the degradation of classification model performance. Therefore, addressing multi- label class imbalance has become a hot research topic. Nowadays, over-sampling approaches aim to generate a superset of the original dataset to deal with imbalanced data. However, traditional over-sampling methods only employ the central data point and its nearest neighbor samples to synthesize samples without considering the impact of data distribution. To address these issues, in this paper, we propose an ensemble multi- label over-sampling algorithm (MLCIO) based on Chebyshev inequality and a group optimization strategy. Firstly, to generate more representative and diverse samples, with the seed sample serving as the sphere's center, Chebyshev inequality is utilized to ensure that synthetic samples fall within its m times the standard deviation. Secondly, a group optimization ranking weighting approach is employed to obtain more reliable and stable label information. Finally, comparative experiments are conducted on 11 imbalanced datasets from various domains using different evaluation metrics. The results demonstrate that our proposal achieves better performance than other approaches.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A Novel Cluster based Over-sampling Approach for Classifying Imbalanced Sentiment Data
    Chang, Jing-Rong
    Chen, Long-Sheng
    Lin, Li-Wei
    IAENG International Journal of Computer Science, 2021, 48 (04):
  • [2] SORAG: Synthetic Data Over-Sampling Strategy on Multi-Label Graphs
    Duan, Yijun
    Liu, Xin
    Jatowt, Adam
    Yu, Hai-tao
    Lynden, Steven
    Kim, Kyoung-Sook
    Matono, Akiyoshi
    REMOTE SENSING, 2022, 14 (18)
  • [3] An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
    Sun, Zhong-Bin
    Diao, Yu-Xuan
    Ma, Su-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (10): : 3392 - 3408
  • [4] Anonymity can Help Minority: A Novel Synthetic Data Over-Sampling Strategy on Multi-label Graphs
    Duan, Yijun
    Liu, Xin
    Jatowt, Adam
    Yu, Hai-Tao
    Lynden, Steven
    Kim, Kyoung-Sook
    Matono, Akiyoshi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 20 - 36
  • [5] An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 : 601 - 610
  • [6] A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification
    Zhang, Huaxiang
    Wang, Zhichao
    ADVANCED DATA MINING AND APPLICATIONS, PT I, 2011, 7120 : 83 - 96
  • [7] A virtual multi-label approach to imbalanced data classification
    Chou, Elizabeth P.
    Yang, Shan-Ping
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (03) : 1461 - 1471
  • [8] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
    Chen, Junfeng
    Zheng, Zhongtuan
    Computer Engineering and Applications, 2024, 57 (23) : 106 - 112
  • [9] Denoise-Based Over-Sampling for Imbalanced Data Classification
    Dan, Wang
    Yian, Liu
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 275 - 278
  • [10] Over-sampling algorithm for imbalanced data classification
    Xu Xiaolong
    Chen Wen
    Sun Yanfei
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191