Text Categorization Based on Clustering Feature Selection

被引:19
|
作者
Zhou, Xiaofei [1 ]
Hu, Yue [1 ]
Guo, Li [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100095, Peoples R China
关键词
Feature selection; text categorization; k-means;
D O I
10.1016/j.procs.2014.05.283
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we discuss a text categorization method based on k-means clustering feature selection. K-means is classical algorithm for data clustering in text mining, but it is seldom used for feature selection. For text data, the words that can express correct semantic in a class are usually good features. We use k-means method to capture several cluster centroids for each class, and then choose the high frequency words in centroids as the text features for categorization. The words extracted by k-means not only can represent each class clustering well, but also own high quality for semantic expression. On three normal text databases, classifiers based on our feature selection method exhibit better performances than original classifiers for text categorization. (C) 2014 Published by Elsevier B.V. Open access under CC BY-NC-ND license.
引用
收藏
页码:398 / 405
页数:8
相关论文
共 50 条
  • [31] Improved Information Gain-based Feature Selection for Text Categorization
    Gao, Zhe
    Xu, Yajing
    Meng, Fanyu
    Qi, Feng
    Lin, Zhiqing
    2014 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, VEHICULAR TECHNOLOGY, INFORMATION THEORY AND AEROSPACE & ELECTRONIC SYSTEMS (VITAE), 2014,
  • [32] An empirical study of feature selection for text categorization based on term weightage
    How, BC
    Narayanan, K
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 599 - 602
  • [33] Temporal-based Feature Selection and Transfer Learning for Text Categorization
    Fukumoto, Fumiyo
    Suzuki, Yoshimi
    2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 17 - 26
  • [34] Study On Feature Selection And Weighting Based On Synonym Merge In Text Categorization
    Lu, Zhenyu
    Lin, Yongmin
    Zhao, Shuang
    Chen, Xuebin
    SECOND INTERNATIONAL CONFERENCE ON FUTURE NETWORKS: ICFN 2010, 2010, : 105 - 109
  • [35] An alternative framework for univariate filter based feature selection for text categorization
    Guru, D. S.
    Suhil, Mahamad
    Raju, Lavanya Narayana
    Kumar, N. Vinay
    PATTERN RECOGNITION LETTERS, 2018, 103 : 23 - 31
  • [36] Local Feature Selection in Text Clustering
    Ribeiro, Marcelo N.
    Neto, Manoel J. R.
    Prudencio, Ricardo B. C.
    ADVANCES IN NEURO-INFORMATION PROCESSING, PT II, 2009, 5507 : 45 - +
  • [37] Research of feature selection for text clustering based on cloud model
    Zhao, Junmin
    Zhang, Kai
    Wan, Jian
    Journal of Software, 2013, 8 (12) : 3246 - 3252
  • [38] Text stream clustering algorithm based on adaptive feature selection
    Gong, Linghui
    Zeng, Jianping
    Zhang, Shiyong
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 1393 - 1399
  • [39] Enhancement of DTP feature selection method for text categorization
    Moyotl-Hernández, E
    Jiménez-Salazar, H
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 719 - 722
  • [40] Applying cascaded feature selection to SVM text categorization
    Masuyama, T
    Nakagawa, H
    13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 241 - 245