A lightweight filter based feature selection approach for multi-label text classification

被引:4
|
作者
Dhal P. [1 ]
Azad C. [1 ]
机构
[1] Department of Computer Science and Engineering, National Institute of Technology, Jamshedpur
关键词
chi-square based feature selection; Multi-label text classification; multi-layer perceptron;
D O I
10.1007/s12652-022-04335-5
中图分类号
学科分类号
摘要
Multi-label Text Classification (MTC) is a challenging task in Natural Language Processing (NLP). The goal of the MTC task is to label a document with a set of labels. By incorporating various term weighting schemes in MTC, high dimensional feature space has been generated; due to that, multi-label learning algorithms face substantial problems in performing MTC tasks. To deal with these issues, Feature Selection (FS) approaches are effective solutions. This paper proposes a Lightweight Term-weighting FS (LwTwFS) approach based on a modified Chi-square (CHI) filter-based FS method to deal with this issue. The modified CHI approach works for Inter-Class Concentration (ICC) and Intra-Class Dispersion (ICD), and its strength has been increased by adding positive and negative correlations. A novel modified equation has been introduced to distribute the features among the categories (i.e., here, multi-label) in the corpus. The proposed modified CHI-based FS approach works on the term weighting-based Feature Extraction (FE) approach. Multi-Layer Perceptron (MLP) has been used in the classification phase due to the adaptive learning property, which refers to learning how to do tasks based on data provided during training or prior experience. We have used two publicly available multi-label corpora for experimental verification: the Arxiv Academic Paper Dataset (AAPD) and the Reuters Corpus Volume I (RCVI-V2). According to the results, in terms of performance, the LwTwFS methodology combined with the MLP classifier surpasses other combinations in terms of Jaccard Score (JS), Hamming Loss (HL), Ranking Loss (RL), Precision (Pr), Recall (Re), and F-micro and F-macro. For the AAPD corpus, the LwTwFS method achieves the best JS, HL, RL, Pr, F-micro, and F-macro values, which are 0.9636, 0.0121, 0.0303, 0.9636, 0.9882, and 0.9894. For the RCVI-V2 corpus, the LwTwFS method achieves the best JS, Pr, Re, F-micro, and F-macro values of 1.0000, and HL, RL values of 0.0000. Empirical results on widely used two benchmark multi-label text corpus show that LwTwFS achieves competitive performance, especially when labels are limited. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:12345 / 12357
页数:12
相关论文
共 50 条
  • [1] A COPRAS-based Approach to Multi-Label Feature Selection for Text Classification
    Mohanrasu, S. S.
    Janani, K.
    Rakkiyappan, R.
    [J]. MATHEMATICS AND COMPUTERS IN SIMULATION, 2024, 222 : 3 - 23
  • [2] Improving Multi-Label Medical Text Classification by Feature Selection
    Glinka, Kinga
    Wozniak, Rafal
    Zakrzewska, Danuta
    [J]. 2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 176 - 181
  • [3] Optimization approach for feature selection in multi-label classification
    Lim, Hyunki
    Lee, Jaesung
    Kim, Dae-Won
    [J]. PATTERN RECOGNITION LETTERS, 2017, 89 : 25 - 30
  • [4] Ensemble feature selection for multi-label text classification: An intelligent order statistics approach
    Miri, Mohsen
    Dowlatshahi, Mohammad Bagher
    Hashemi, Amin
    Rafsanjani, Marjan Kuchaki
    Gupta, Brij B.
    Alhalabi, W.
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 11319 - 11341
  • [5] A Feature Selection Method for Multi-Label Text Based on Feature Importance
    Zhang, Lu
    Duan, Qingling
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (04):
  • [6] Feature Selection for Multi-label Classification Problems
    Doquire, Gauthier
    Verleysen, Michel
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2011, PT I, 2011, 6691 : 9 - 16
  • [7] Feature Selection for Hierarchical Multi-label Classification
    da Silva, Luan V. M.
    Cerri, Ricardo
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021, 2021, 12695 : 196 - 208
  • [8] An Ensemble Embedded Feature Selection Method for Multi-Label Clinical Text Classification
    Guo, Yumeng
    Chung, Fulai
    Li, Guozheng
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 823 - 826
  • [9] Multi-label text classification with an ensemble feature space
    Tandon, Kushagri
    Chatterjee, Niladri
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4425 - 4436
  • [10] Multi-label text classification with an ensemble feature space
    Tandon, Kushagri
    Chatterjee, Niladri
    [J]. Journal of Intelligent and Fuzzy Systems, 2022, 42 (05): : 4425 - 4436