A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data

被引:0
|
作者
Ren, Weishuo [1 ,2 ]
Zheng, Yifeng [1 ,2 ]
Zhang, Wenjie [1 ,2 ]
Qing, Depeng [1 ,2 ]
Zeng, Xianlong [1 ,2 ]
Li, Guohe [3 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Fujian, Peoples R China
[2] Fujian Prov Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China
[3] China Univ Petr, Coll Informat Sci & Engn, Beijing 102249, Peoples R China
关键词
Multi-label classification; Imbalanced data; Over-sampling approach; Chebyshev inequality; Group optimization strategy; CLASSIFICATION;
D O I
10.1016/j.neucom.2024.128717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of intelligent technology, data exhibits characteristics of multi-label and imbalanced distribution, which lead to the degradation of classification model performance. Therefore, addressing multi- label class imbalance has become a hot research topic. Nowadays, over-sampling approaches aim to generate a superset of the original dataset to deal with imbalanced data. However, traditional over-sampling methods only employ the central data point and its nearest neighbor samples to synthesize samples without considering the impact of data distribution. To address these issues, in this paper, we propose an ensemble multi- label over-sampling algorithm (MLCIO) based on Chebyshev inequality and a group optimization strategy. Firstly, to generate more representative and diverse samples, with the seed sample serving as the sphere's center, Chebyshev inequality is utilized to ensure that synthetic samples fall within its m times the standard deviation. Secondly, a group optimization ranking weighting approach is employed to obtain more reliable and stable label information. Finally, comparative experiments are conducted on 11 imbalanced datasets from various domains using different evaluation metrics. The results demonstrate that our proposal achieves better performance than other approaches.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Deep Over-sampling Framework for Classifying Imbalanced Data
    Ando, Shin
    Huang, Chun Yuan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 770 - 785
  • [22] Over-sampling methods for mixed data in imbalanced problems
    Alonso, Hugo
    da Costa, Joaquim Fernando Pinto
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [23] An Ensemble-based Approach to Fast Classification of Multi-label Data Streams
    Kong, Xiangnan
    Yu, Philip S.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2011, : 95 - 104
  • [24] Over-Sampling Algorithm Based on VAE in Imbalanced Classification
    Zhang, Chunkai
    Zhou, Ying
    Chen, Yingyang
    Deng, Yepeng
    Wang, Xuan
    Dong, Lifeng
    Wei, Haoyu
    CLOUD COMPUTING - CLOUD 2018, 2018, 10967 : 334 - 344
  • [25] A Novel Borderline Over-Sampling Method Based on KNN and Deep Gaussian Mixture Model for Imbalanced Data
    Zhang H.
    Xiao H.
    Yi C.
    Yuan R.
    Data Analysis and Knowledge Discovery, 2023, 7 (05) : 116 - 122
  • [26] Imbalanced Data Handling in Multi-label Aspect Categorization using Oversampling and Ensemble Learning
    Alnatara, Wildan Dicky
    Khodra, Masayu Leylia
    ICACSIS 2020: 2020 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2020, : 165 - 170
  • [27] A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 2354 - 2359
  • [28] An Effective Over-sampling Method for Imbalanced Data Sets Classification
    Zhai Yun
    Ma Nan
    Ruan Da
    An Bing
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (03): : 489 - 494
  • [29] Multiple adaptive over-sampling for imbalanced data evidential classification
    Zhang, Zhen
    Tian, Hong -peng
    Jin, Jin-shuai
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [30] ECC plus plus : An algorithm family based on ensemble of classifier chains for classifying imbalanced multi-label data
    Duan, Jicong
    Gu, Yan
    Yu, Hualong
    Yang, Xibei
    Gao, Shang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 236