A two-stage feature selection method for text categorization

被引:43
|
作者
Meng, Jiana [1 ,2 ]
Lin, Hongfei [1 ]
Yu, Yuhai [1 ,3 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116024, Peoples R China
[2] Dalian Nationalities Univ, Coll Sci, Dalian 116600, Peoples R China
[3] Dalian Nationalities Univ, Sch Comp Sci & Engn, Dalian 116600, Peoples R China
关键词
Feature selection; Text categorization; Latent semantic indexing; Support vector machine;
D O I
10.1016/j.camwa.2011.07.045
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Feature selection for text categorization is a well-studied problem and its goal is to improve the effectiveness of categorization, or the efficiency of computation, or both. The system of text categorization based on traditional term-matching is used to represent the vector space model as a document; however, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which leads to a poor categorization accuracy. The latent semantic indexing method can overcome this problem by using statistically derived conceptual indices to replace the individual terms. With the purpose of improving the accuracy and efficiency of categorization, in this paper we propose a two-stage feature selection method. Firstly, we apply a novel feature selection method to reduce the dimension of terms; and then we construct a new semantic space, between terms, based on the latent semantic indexing method. Through some applications involving the spam database categorization, we find that our two-stage feature selection method performs better. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2793 / 2800
页数:8
相关论文
共 50 条
  • [31] Feature Selection for SAR Target Discrimination and Efficient Two-Stage Detection Method
    Jeong, Nam-Hoon
    Choi, Jae-Ho
    Lee, Geon
    Park, Ji-Hoon
    Kim, Kyung-Tae
    REMOTE SENSING, 2022, 14 (16)
  • [32] A global-ranking local feature selection method for text categorization
    Pinheiro, Roberto H. W.
    Cavalcanti, George D. C.
    Correa, Renato F.
    Ren, Tsang Ing
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (17) : 12851 - 12857
  • [33] Adaptive Two-Stage Feature Selection for Sentiment Classification
    Chi, Xu
    Cambria, Erik
    Siew, Tan Puay
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1238 - 1243
  • [34] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    WuhanUniversityJournalofNaturalSciences, 2006, (05) : 1335 - 1339
  • [35] Normalized and classified feature selection in text categorization
    Wang, XJ
    Guo, J
    Zheng, KF
    INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 173 - 176
  • [36] Improving Text Categorization by Multicriteria Feature Selection
    Doan, Son
    Horiguchi, Susumu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2005, 9 (05) : 570 - 575
  • [37] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [38] Study on Feature Selection in Finance Text Categorization
    Sun, Changqiu
    Wang, Xiaolong
    Xu, Jun
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 5077 - 5082
  • [39] Study on constraints for feature selection in text categorization
    Xu, Yan
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    Zhang, Sen
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2008, 45 (04): : 596 - 602
  • [40] Words as rules:: Feature selection in text categorization
    Montañés, E
    Combarro, EF
    Díaz, I
    Ranilla, J
    Quevedo, JR
    COMPUTATIONAL SCIENCE - ICCS 2004, PT 1, PROCEEDINGS, 2004, 3036 : 666 - 669