Feature Selection Method Based on Crossed Centroid for Text Categorization

被引:0
|
作者
Yang, Jieming [1 ]
Liu, Zhiying [1 ]
Qu, Zhaoyang [1 ]
Wang, Jing [1 ]
机构
[1] Northeast Dianli Univ, Sch Informat Engn, Jilin, Jilin, Peoples R China
关键词
feature selection; text categorization; across centroid; high dimension; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new feature selection, which evaluates the deviation from the centroid based on both inter-category and intra-category. We compared the proposed method with four well-known feature selection algorithms using support vector machines on three benchmark datasets (20-newgroups, reuters-21578 and webkb). The experimental results show that the proposed method can significantly improve the performance of the classifier.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [21] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [22] Study on Feature Selection in Finance Text Categorization
    Sun, Changqiu
    Wang, Xiaolong
    Xu, Jun
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 5077 - 5082
  • [23] Improving Text Categorization by Multicriteria Feature Selection
    Doan, Son
    Horiguchi, Susumu
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2005, 9 (05) : 570 - 575
  • [24] A NEW FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION BASED ON INFORMATION GAIN AND PARTICLE SWARM OPTIMIZATION
    Yigit, Ferruh
    Baykan, Omer Kaan
    [J]. 2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 523 - 529
  • [25] Normalized and classified feature selection in text categorization
    Wang, XJ
    Guo, J
    Zheng, KF
    [J]. INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 173 - 176
  • [26] Cascaded feature selection in SVMs text categorization
    Masuyama, T
    Nakagawa, H
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 588 - 591
  • [27] Words as rules:: Feature selection in text categorization
    Montañés, E
    Combarro, EF
    Díaz, I
    Ranilla, J
    Quevedo, JR
    [J]. COMPUTATIONAL SCIENCE - ICCS 2004, PT 1, PROCEEDINGS, 2004, 3036 : 666 - 669
  • [28] Study on constraints for feature selection in text categorization
    Xu, Yan
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    Zhang, Sen
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2008, 45 (04): : 596 - 602
  • [29] A new approach to feature selection for text categorization
    Li, SS
    Zong, CQ
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 626 - 630
  • [30] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454