Text Categorization Based on Clustering Feature Selection

被引:19
|
作者
Zhou, Xiaofei [1 ]
Hu, Yue [1 ]
Guo, Li [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100095, Peoples R China
关键词
Feature selection; text categorization; k-means;
D O I
10.1016/j.procs.2014.05.283
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we discuss a text categorization method based on k-means clustering feature selection. K-means is classical algorithm for data clustering in text mining, but it is seldom used for feature selection. For text data, the words that can express correct semantic in a class are usually good features. We use k-means method to capture several cluster centroids for each class, and then choose the high frequency words in centroids as the text features for categorization. The words extracted by k-means not only can represent each class clustering well, but also own high quality for semantic expression. On three normal text databases, classifiers based on our feature selection method exhibit better performances than original classifiers for text categorization. (C) 2014 Published by Elsevier B.V. Open access under CC BY-NC-ND license.
引用
下载
收藏
页码:398 / 405
页数:8
相关论文
共 50 条
  • [1] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [2] Feature subset selection in SOM based text categorization
    Bassiouny, S
    Nagi, M
    Hussein, MF
    [J]. IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 860 - 866
  • [3] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    [J]. SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [4] Feature selection strategies for text categorization
    Soucy, P
    Mineau, GW
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 505 - 509
  • [5] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    [J]. INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484
  • [6] Feature Selection Method Based on Crossed Centroid for Text Categorization
    Yang, Jieming
    Liu, Zhiying
    Qu, Zhaoyang
    Wang, Jing
    [J]. 2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2014, : 11 - 15
  • [7] CLDA: Feature selection for text categorization based on constrained LDA
    Cui Zifeng
    Xu Baowen
    Zhang Weifeng
    Jiang Dawei
    Xu Junling
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 702 - +
  • [8] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    [J]. Wuhan University Journal of Natural Sciences, 2006, (05) : 1335 - 1339
  • [9] Normalized and classified feature selection in text categorization
    Wang, XJ
    Guo, J
    Zheng, KF
    [J]. INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 173 - 176
  • [10] Improving Text Categorization by Multicriteria Feature Selection
    Doan, Son
    Horiguchi, Susumu
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2005, 9 (05) : 570 - 575