Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification

被引:11
|
作者
Taha, Adil Yaseen [1 ]
Tiun, Sabrina [1 ]
Abd Rahman, Abdul Hadi [1 ]
Sabah, Ali [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Selangor, Malaysia
关键词
Data mining; multilabel text classification; class imbalance problem; resampling method; class alignment; FEATURE-SELECTION; DATA-SETS; LABEL; CATEGORIZATION; INSIGHT; SMOTE;
D O I
10.32890/jict2021.20.3.6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Simultaneous multiple labeling of documents, also known as multilabel text classification, will not perform optimally if the class is highly imbalanced. Class imbalance entails skewness in the fundamental data for distribution that leads to more difficulty in classification. Random over-sampling and under-sampling are common approaches to solve the class imbalance problem. However, these approaches have several drawbacks; under-sampling is likely to dispose of useful data, whereas over-sampling can heighten the probability of overfitting. Therefore, a new method that can avoid discarding useful data and overfitting problems is needed. This study proposed a method to tackle the class imbalance problem by combining multilabel over-sampling and under-sampling with class alignment (ML-OUSCA). In the proposed ML-OUSCA, instead of using all the training instances, it drew a new training set by over-sampling small size classes and under-sampling big size classes. To evaluate the proposed ML-OUSCA, evaluation metrics of average precision, average recall, and average F-measure on three benchmark datasets, namely Reuters-21578, Bibtex, and Enron datasets, were performed. Experimental results showed that the proposed ML-OUSCA outperformed the chosen baseline random resampling approaches: K-means SMOTE and KNN-US. Therefore, based on the results, it can be concluded that designing a resampling method based on class imbalance together with class alignment will improve multilabel classification even better than just the random resampling method.
引用
收藏
页码:423 / 456
页数:34
相关论文
共 50 条
  • [1] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Hao Ding
    Bin Wei
    Zhaorui Gu
    Zhibin Yu
    Haiyong Zheng
    Bing Zheng
    Juan Li
    [J]. Multimedia Tools and Applications, 2020, 79 : 14871 - 14888
  • [2] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Ding, Hao
    Wei, Bin
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Haiyong
    Zheng, Bing
    Li, Juan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (21-22) : 14871 - 14888
  • [3] Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification
    Du, Jie
    Vong, Chi-Man
    Chang, Yajie
    Jiao, Yang
    [J]. PROCEEDINGS OF ELM-2016, 2018, 9 : 229 - 239
  • [4] A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    [J]. 39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 2354 - 2359
  • [5] Boosting the performance of over-sampling algorithms through under-sampling the minority class
    de Morais, Romero F. A. B.
    Vasconcelos, Germano C.
    [J]. NEUROCOMPUTING, 2019, 343 : 3 - 18
  • [6] Over-sampling algorithm for imbalanced data classification
    Xu Xiaolong
    Chen Wen
    Sun Yanfei
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191
  • [7] Imbalanced Node Classification With Synthetic Over-Sampling
    Zhao, Tianxiang
    Zhang, Xiang
    Wang, Suhang
    [J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36 (12) : 8515 - 8528
  • [8] Over-sampling algorithm for imbalanced data classification
    XU Xiaolong
    CHEN Wen
    SUN Yanfei
    [J]. Journal of Systems Engineering and Electronics, 2019, 30 (06) : 1182 - 1191
  • [9] An Improved Under-sampling Imbalanced Classification Algorithm
    Yao, Baofeng
    Wang, Lei
    [J]. 2021 13TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2021), 2021, : 775 - 779
  • [10] An Over-Sampling Technique with Rejection for Imbalanced Class Learning
    Lee, Jaedong
    Kim, Noo-ri
    Lee, Jee-Hyong
    [J]. ACM IMCOM 2015, PROCEEDINGS, 2015,