A lightweight filter based feature selection approach for multi-label text classification

被引:4
|
作者
Dhal P. [1 ]
Azad C. [1 ]
机构
[1] Department of Computer Science and Engineering, National Institute of Technology, Jamshedpur
关键词
chi-square based feature selection; Multi-label text classification; multi-layer perceptron;
D O I
10.1007/s12652-022-04335-5
中图分类号
学科分类号
摘要
Multi-label Text Classification (MTC) is a challenging task in Natural Language Processing (NLP). The goal of the MTC task is to label a document with a set of labels. By incorporating various term weighting schemes in MTC, high dimensional feature space has been generated; due to that, multi-label learning algorithms face substantial problems in performing MTC tasks. To deal with these issues, Feature Selection (FS) approaches are effective solutions. This paper proposes a Lightweight Term-weighting FS (LwTwFS) approach based on a modified Chi-square (CHI) filter-based FS method to deal with this issue. The modified CHI approach works for Inter-Class Concentration (ICC) and Intra-Class Dispersion (ICD), and its strength has been increased by adding positive and negative correlations. A novel modified equation has been introduced to distribute the features among the categories (i.e., here, multi-label) in the corpus. The proposed modified CHI-based FS approach works on the term weighting-based Feature Extraction (FE) approach. Multi-Layer Perceptron (MLP) has been used in the classification phase due to the adaptive learning property, which refers to learning how to do tasks based on data provided during training or prior experience. We have used two publicly available multi-label corpora for experimental verification: the Arxiv Academic Paper Dataset (AAPD) and the Reuters Corpus Volume I (RCVI-V2). According to the results, in terms of performance, the LwTwFS methodology combined with the MLP classifier surpasses other combinations in terms of Jaccard Score (JS), Hamming Loss (HL), Ranking Loss (RL), Precision (Pr), Recall (Re), and F-micro and F-macro. For the AAPD corpus, the LwTwFS method achieves the best JS, HL, RL, Pr, F-micro, and F-macro values, which are 0.9636, 0.0121, 0.0303, 0.9636, 0.9882, and 0.9894. For the RCVI-V2 corpus, the LwTwFS method achieves the best JS, Pr, Re, F-micro, and F-macro values of 1.0000, and HL, RL values of 0.0000. Empirical results on widely used two benchmark multi-label text corpus show that LwTwFS achieves competitive performance, especially when labels are limited. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:12345 / 12357
页数:12
相关论文
共 50 条
  • [31] Multi-Label Text Classification Based on DistilBERT and Label Correlation
    Wang, Xuyang
    Geng, Liuqing
    Zhang, Xin
    Computer Engineering and Applications, 2024, 60 (23) : 168 - 175
  • [32] An efficient Pareto-based feature selection algorithm for multi-label classification
    Hashemi, Amin
    Dowlatshahi, Mohammad Bagher
    Nezamabadi-pour, Hossein
    INFORMATION SCIENCES, 2021, 581 : 428 - 447
  • [33] Multi-task Joint Feature Selection for Multi-label Classification
    He Zhifen
    Yang Ming
    Liu Huidong
    CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (02) : 281 - 287
  • [34] Multi-task Joint Feature Selection for Multi-label Classification
    HE Zhifen
    YANG Ming
    LIU Huidong
    Chinese Journal of Electronics, 2015, 24 (02) : 281 - 287
  • [35] Multi-label feature selection based on correlation label enhancement
    He, Zhuoxin
    Lin, Yaojin
    Wang, Chenxi
    Guo, Lei
    Ding, Weiping
    INFORMATION SCIENCES, 2023, 647
  • [36] Multi-label feature selection based on the division of label topics
    Zhang, Ping
    Gao, Wanfu
    Hu, Juncheng
    Li, Yonghao
    INFORMATION SCIENCES, 2021, 553 : 129 - 153
  • [37] Feature Selection for Multi-label Classification Using Neighborhood Preservation
    Cai, Zhiling
    Zhu, William
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2018, 5 (01) : 320 - 330
  • [38] A many-objective feature selection for multi-label classification
    Dong, Hongbin
    Sun, Jing
    Sun, Xiaohang
    Ding, Rui
    KNOWLEDGE-BASED SYSTEMS, 2020, 208
  • [39] gMLC: a multi-label feature selection framework for graph classification
    Xiangnan Kong
    Philip S. Yu
    Knowledge and Information Systems, 2012, 31 : 281 - 305
  • [40] Embedded Feature Selection for Multi-label Classification of Music Emotions
    You, Mingyu
    Liu, Jiaming
    Li, Guo-Zheng
    Chen, Yan
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2012, 5 (04) : 668 - 678