A new feature selection method for handling redundant information in text classification

被引:0
|
作者
You-wei Wang
Li-zhou Feng
机构
[1] Central University of Finance and Economics,School of Information
[2] Tianjin University of Finance and Economics,School of Science and Engineering
关键词
Feature selection; Dimensionality reduction; Text classification; Redundant features; Support vector machine; Naïve Bayes; Mutual information; TP391;
D O I
暂无
中图分类号
学科分类号
摘要
Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant information, we propose a new simple feature selection method, which can effectively filter the redundant features. First, to calculate the relationship between two words, the definitions of word frequency based relevance and correlative redundancy are introduced. Furthermore, an optimal feature selection (OFS) method is chosen to obtain a feature subset FS1. Finally, to improve the execution speed, the redundant features in FS1 are filtered by combining a predetermined threshold, and the filtered features are memorized in the linked lists. Experiments are carried out on three datasets (WebKB, 20-Newsgroups, and Reuters-21578) where in support vector machines and naïve Bayes are used. The results show that the classification accuracy of the proposed method is generally higher than that of typical traditional methods (information gain, improved Gini index, and improved comprehensively measured feature selection) and the OFS methods. Moreover, the proposed method runs faster than typical mutual information-based methods (improved and normalized mutual information-based feature selections, and multilabel feature selection based on maximum dependency and minimum redundancy) while simultaneously ensuring classification accuracy. Statistical results validate the effectiveness of the proposed method in handling redundant information in text classification.
引用
收藏
页码:221 / 234
页数:13
相关论文
共 50 条
  • [1] A new feature selection method for handling redundant information in text classification
    Wang, You-wei
    Feng, Li-zhou
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2018, 19 (02) : 221 - 234
  • [2] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [3] A New Filter Feature Selection Method for Text Classification
    Cekik, Rasim
    [J]. IEEE Access, 2024, 12 : 139316 - 139335
  • [4] A new method for feature subset selection for handling classification problems
    Chen, SM
    Shie, JD
    [J]. FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 183 - 188
  • [5] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [6] Text feature selection method for hierarchical classification
    Zhu, Cui-Ling
    Ma, Jun
    Zhang, Dong-Mei
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
  • [7] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [8] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [9] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [10] A New Feature Selection Method for Text Classification Based on Independent Feature Space Search
    Liu, Yong
    Ju, Shenggen
    Wang, Junfeng
    Su, Chong
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020