A lightweight filter based feature selection approach for multi-label text classification

被引:4
|
作者
Dhal P. [1 ]
Azad C. [1 ]
机构
[1] Department of Computer Science and Engineering, National Institute of Technology, Jamshedpur
关键词
chi-square based feature selection; Multi-label text classification; multi-layer perceptron;
D O I
10.1007/s12652-022-04335-5
中图分类号
学科分类号
摘要
Multi-label Text Classification (MTC) is a challenging task in Natural Language Processing (NLP). The goal of the MTC task is to label a document with a set of labels. By incorporating various term weighting schemes in MTC, high dimensional feature space has been generated; due to that, multi-label learning algorithms face substantial problems in performing MTC tasks. To deal with these issues, Feature Selection (FS) approaches are effective solutions. This paper proposes a Lightweight Term-weighting FS (LwTwFS) approach based on a modified Chi-square (CHI) filter-based FS method to deal with this issue. The modified CHI approach works for Inter-Class Concentration (ICC) and Intra-Class Dispersion (ICD), and its strength has been increased by adding positive and negative correlations. A novel modified equation has been introduced to distribute the features among the categories (i.e., here, multi-label) in the corpus. The proposed modified CHI-based FS approach works on the term weighting-based Feature Extraction (FE) approach. Multi-Layer Perceptron (MLP) has been used in the classification phase due to the adaptive learning property, which refers to learning how to do tasks based on data provided during training or prior experience. We have used two publicly available multi-label corpora for experimental verification: the Arxiv Academic Paper Dataset (AAPD) and the Reuters Corpus Volume I (RCVI-V2). According to the results, in terms of performance, the LwTwFS methodology combined with the MLP classifier surpasses other combinations in terms of Jaccard Score (JS), Hamming Loss (HL), Ranking Loss (RL), Precision (Pr), Recall (Re), and F-micro and F-macro. For the AAPD corpus, the LwTwFS method achieves the best JS, HL, RL, Pr, F-micro, and F-macro values, which are 0.9636, 0.0121, 0.0303, 0.9636, 0.9882, and 0.9894. For the RCVI-V2 corpus, the LwTwFS method achieves the best JS, Pr, Re, F-micro, and F-macro values of 1.0000, and HL, RL values of 0.0000. Empirical results on widely used two benchmark multi-label text corpus show that LwTwFS achieves competitive performance, especially when labels are limited. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:12345 / 12357
页数:12
相关论文
共 50 条
  • [21] Hybrid Feature-Based Multi-label Text Classification-A Framework
    Agarwal, Nancy
    Wani, Mudasir Ahmad
    ELAffendi, Mohammed
    ADVANCES IN CYBERSECURITY, CYBERCRIMES, AND SMART EMERGING TECHNOLOGIES, 2023, 4 : 211 - 221
  • [22] Reasearch on Feature Mapping Based on Labels Information in Multi-label Text Classification
    Wang, Tao
    Luo, Tao
    Li, Jianfeng
    Wang, Cong
    PROCEEDINGS OF 2017 IEEE 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC), 2017, : 452 - 456
  • [23] Document transformation for multi-label feature selection in text categorization
    Chen, Weizhu
    Yan, Jun
    Zhang, Benyu
    Chen, Zheng
    Yang, Qiang
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 451 - +
  • [24] Multi-label feature selection based on label correlations and feature redundancy
    Fan, Yuling
    Chen, Baihua
    Huang, Weiqin
    Liu, Jinghua
    Weng, Wei
    Lan, Weiyao
    KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [25] Label distribution feature selection for multi-label classification with rough set
    Qian, Wenbin
    Huang, Jintao
    Wang, Yinglong
    Xie, Yonghong
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 128 : 32 - 55
  • [26] Multi-label feature selection based on label distribution and feature complementarity
    Qian, Wenbin
    Long, Xuandong
    Wang, Yinglong
    Xie, Yonghong
    APPLIED SOFT COMPUTING, 2020, 90
  • [27] A Multi-label Filter Feature Selection Method Based on Approximate Pareto Dominance
    Zhou, Jian
    Guo, Yinnong
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (07) : 127 - 133
  • [28] A Combined Approach for Multi-Label Text Data Classification
    Strimaitis, Rokas
    Stefanovic, Pavel
    Ramanauskaite, Simona
    Slotkiene, Asta
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [29] Multi-objective Optimisation-Based Feature Selection for Multi-label Classification
    Khan, Mohammed Arif
    Ekbal, Asif
    Mencia, Eneldo Loza
    Fuernkranz, Johannes
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 38 - 41
  • [30] Multi-objective PSO based online feature selection for multi-label classification
    Paul, Dipanjyoti
    Jain, Anushree
    Saha, Sriparna
    Mathew, Jimson
    KNOWLEDGE-BASED SYSTEMS, 2021, 222