Feature Selection Method Based on Crossed Centroid for Text Categorization

被引:0
|
作者
Yang, Jieming [1 ]
Liu, Zhiying [1 ]
Qu, Zhaoyang [1 ]
Wang, Jing [1 ]
机构
[1] Northeast Dianli Univ, Sch Informat Engn, Jilin, Jilin, Peoples R China
关键词
feature selection; text categorization; across centroid; high dimension; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new feature selection, which evaluates the deviation from the centroid based on both inter-category and intra-category. We compared the proposed method with four well-known feature selection algorithms using support vector machines on three benchmark datasets (20-newgroups, reuters-21578 and webkb). The experimental results show that the proposed method can significantly improve the performance of the classifier.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [31] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [32] An examination of feature selection frameworks in text categorization
    How, BC
    Kiong, WT
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 558 - 564
  • [33] Class normalization in centroid-based text categorization
    Lertnattee, Verayuth
    Theeramunkong, Thanaruk
    INFORMATION SCIENCES, 2006, 176 (12) : 1712 - 1738
  • [34] A Framework of Centroid-Based Methods for Text Categorization
    Wang, Dandan
    Chen, Qingcai
    Wang, Xiaolong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02): : 245 - 254
  • [35] A New Centroid-Based Classifier for Text Categorization
    Chen, Lifei
    Ye, Yanfang
    Jiang, Qingshan
    2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, 2008, : 1217 - +
  • [36] A generalized cluster centroid based classifier for text categorization
    Pang, Guansong
    Jiang, Shengyi
    INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (02) : 576 - 586
  • [37] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng
    Zheng Xuefeng
    Zhu Jianyong
    Xiao Yunhong
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2009, 20 (03) : 651 - 659
  • [38] Study on mutual information-based feature selection for text categorization
    Xu, Yan
    Jones, Gareth
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    Journal of Computational Information Systems, 2007, 3 (03): : 1007 - 1012
  • [39] Hybrid feature selection based on enhanced genetic algorithm for text categorization
    Ghareb, Abdullah Saeed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 49 : 31 - 47
  • [40] New Feature Selection Methods Based on Context Similarity for Text Categorization
    Chen, Yifei
    Han, Bingqing
    Hou, Ping
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 598 - 604