A two-stage feature selection method for text categorization

被引:43
|
作者
Meng, Jiana [1 ,2 ]
Lin, Hongfei [1 ]
Yu, Yuhai [1 ,3 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116024, Peoples R China
[2] Dalian Nationalities Univ, Coll Sci, Dalian 116600, Peoples R China
[3] Dalian Nationalities Univ, Sch Comp Sci & Engn, Dalian 116600, Peoples R China
关键词
Feature selection; Text categorization; Latent semantic indexing; Support vector machine;
D O I
10.1016/j.camwa.2011.07.045
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Feature selection for text categorization is a well-studied problem and its goal is to improve the effectiveness of categorization, or the efficiency of computation, or both. The system of text categorization based on traditional term-matching is used to represent the vector space model as a document; however, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which leads to a poor categorization accuracy. The latent semantic indexing method can overcome this problem by using statistically derived conceptual indices to replace the individual terms. With the purpose of improving the accuracy and efficiency of categorization, in this paper we propose a two-stage feature selection method. Firstly, we apply a novel feature selection method to reduce the dimension of terms; and then we construct a new semantic space, between terms, based on the latent semantic indexing method. Through some applications involving the spam database categorization, we find that our two-stage feature selection method performs better. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2793 / 2800
页数:8
相关论文
共 50 条
  • [21] Two-Stage Feature Selection with Unsupervised Second Stage
    Xu, Ke
    Maung, Crystal
    Arai, Hiromasa
    Schweitzer, Haim
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (07)
  • [22] A hybrid two-stage feature selection method based on differential evolution
    Qiu, Chenye
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (01) : 871 - 884
  • [23] Feature Selection Method Based on Crossed Centroid for Text Categorization
    Yang, Jieming
    Liu, Zhiying
    Qu, Zhaoyang
    Wang, Jing
    2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2014, : 11 - 15
  • [24] A New Feature Selection Method for Text Categorization of Customer Reviews
    Liu, Miao
    Lu, Xiaoling
    Song, Jie
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2016, 45 (04) : 1397 - 1409
  • [25] Improved Comprehensive Measurement Feature Selection Method for Text Categorization
    Feng, LiZhou
    Zuo, WanLi
    Wang, YouWei
    2015 INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC), 2015, : 125 - 128
  • [26] Trigonometric comparison measure: A feature selection method for text categorization
    Kim, Kyoungok
    Zzang, See Young
    DATA & KNOWLEDGE ENGINEERING, 2019, 119 : 1 - 21
  • [27] Improving Farsi Multiclass Text Classification Using a Thesaurus and Two-Stage Feature Selection
    Maghsoodi, Nooshin
    Homayounpour, Mohammad Mehdi
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (10): : 2055 - 2066
  • [28] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [29] Feature selection strategies for text categorization
    Soucy, P
    Mineau, GW
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 505 - 509
  • [30] Two-stage Unsupervised Feature Selection Method Oriented to Manufacturing Procedural Data
    Zhang J.
    Sheng X.
    Zhang P.
    Qin W.
    Zhao X.
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2019, 55 (17): : 133 - 144