Feature Selection Method Based on Crossed Centroid for Text Categorization

被引:0
|
作者
Yang, Jieming [1 ]
Liu, Zhiying [1 ]
Qu, Zhaoyang [1 ]
Wang, Jing [1 ]
机构
[1] Northeast Dianli Univ, Sch Informat Engn, Jilin, Jilin, Peoples R China
关键词
feature selection; text categorization; across centroid; high dimension; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new feature selection, which evaluates the deviation from the centroid based on both inter-category and intra-category. We compared the proposed method with four well-known feature selection algorithms using support vector machines on three benchmark datasets (20-newgroups, reuters-21578 and webkb). The experimental results show that the proposed method can significantly improve the performance of the classifier.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [1] A hybrid feature selection method for text categorization
    Montanes, E.
    Quevedo, J. R.
    Combarro, E. F.
    Diaz, I.
    Ranilla, J.
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2007, 15 (02) : 133 - 151
  • [2] An Effective Feature Selection Method for Text Categorization
    Qiu, Xipeng
    Zhou, Jinlong
    Huang, Xuanjing
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 50 - 61
  • [3] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [4] A new Chinese text feature selection method in centroid-based classifier
    Gu, Yijun
    Wang, Rong
    Wang, Jianhua
    Yu, Jiangde
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 88 - +
  • [5] Enhancement of DTP feature selection method for text categorization
    Moyotl-Hernández, E
    Jiménez-Salazar, H
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 719 - 722
  • [6] A discriminative and semantic feature selection method for text categorization
    Zong, Wei
    Wu, Feng
    Chu, Lap-Keung
    Sculli, Domenic
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2015, 165 : 215 - 222
  • [7] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [8] A Method of Feature Selection Based on Word2Vec in Text Categorization
    Tian, Wenfeng
    Li, Jun
    Li, Hongguang
    [J]. 2018 37TH CHINESE CONTROL CONFERENCE (CCC), 2018, : 9452 - 9455
  • [9] Feature subset selection in SOM based text categorization
    Bassiouny, S
    Nagi, M
    Hussein, MF
    [J]. IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 860 - 866
  • [10] A New Feature Selection Method for Text Categorization of Customer Reviews
    Liu, Miao
    Lu, Xiaoling
    Song, Jie
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2016, 45 (04) : 1397 - 1409