A virtual multi-label approach to imbalanced data classification

被引:0
|
作者
Chou, Elizabeth P. [1 ]
Yang, Shan-Ping [1 ]
机构
[1] Natl Chengchi Univ, Dept Stat, Taipei, Taiwan
关键词
Imbalance; Classification; Virtual multi-label; Equal k-means; SUPPORT;
D O I
10.1080/03610918.2022.2049820
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.
引用
收藏
页码:1461 / 1471
页数:11
相关论文
共 50 条
  • [21] A Simple Approach to Incorporate Label Dependency in Multi-label Classification
    Cherman, Everton Alvares
    Metz, Jean
    Monard, Maria Carolina
    ADVANCES IN SOFT COMPUTING - MICAI 2010, PT II, 2010, 6438 : 33 - 43
  • [22] Imbalanced and missing multi-label data learning with global and local structure
    Su, Xinpei
    Xu, Yitian
    INFORMATION SCIENCES, 2024, 677
  • [23] MLAWSMOTE: Oversampling in Imbalanced Multi-label Classification with Missing Labels by Learning Label Correlation Matrix
    Mao, Jian
    Huang, Kai
    Liu, Jinming
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [24] An Ensemble-based Approach to Fast Classification of Multi-label Data Streams
    Kong, Xiangnan
    Yu, Philip S.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2011, : 95 - 104
  • [25] Predicting Multiple Outcomes Associated with Frailty based on Imbalanced Multi-label Classification
    Tarekegn, Adane Nega
    Michalak, Krzysztof
    Costa, Giuseppe
    Ricceri, Fulvio
    Giacobini, Mario
    Journal of Healthcare Informatics Research, 2024, 8 (04) : 594 - 618
  • [26] Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach
    Chai, Yuyang
    Li, Zhuang
    Liu, Jiahui
    Chen, Lei
    Li, Fei
    Ji, Donghong
    Teng, Chong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17727 - 17735
  • [27] Label Expansion for Multi-Label Classification
    Rivolli, Adriano
    Soares, Carlos
    de Carvalho, Andre C. P. L. F.
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 414 - 419
  • [28] Pseudo Labels for Imbalanced Multi-Label Learning
    Zeng, Wenrong
    Chen, Xuewen
    Cheng, Hong
    2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 25 - 31
  • [29] A Community Discovery Approach in Multi-label data
    Li, Na
    Pan, Zhisong
    Jiang, MingChu
    Zhang, Yanyan
    Yang, Haimin
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2196 - 2203
  • [30] Respiratory Sounds Classification employing a Multi-label Approach
    Romero Gomez, Andres Felipe
    Orjuela-Canon, Alvaro D.
    2021 IEEE COLOMBIAN CONFERENCE ON APPLICATIONS OF COMPUTATIONAL INTELLIGENCE - COLCACI, 2021,