Rough set based hybrid algorithm for text classification

被引:47
|
作者
Miao, Duoqian [1 ]
Duan, Qiguo [1 ]
Zhang, Hongyu [1 ]
Jiao, Na [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Variable precision rough set (VPRS); k-nearest neighbor (kNN); Rocchio algorithm;
D O I
10.1016/j.eswa.2008.12.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic flue to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method Without premodelling, kNN has a high cost to classify new documents when training set is large. Rocchio algorithm is another well-known and widely used technique for text classification. One drawback of [tie Rocchio classifier is that it restricts the hypothesis space to the set of linear separable hyperplane regions. When the data does not fit its underlying assumption well, Rocchio classifier suffers. In this paper, a hybrid algorithm based on variable precision rough set is proposed to combine the strength of both kNN and Rocchio techniques and overcome their weaknesses. Art experimental evaluation of different methods is carried out oil two common text corpora, i.e., the Reuters-21578 collection and the 20-newsgroup collection. The experimental results indicate that the novel algorithm achieves significant performance improvement. (C) 2008 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:9168 / 9174
页数:7
相关论文
共 50 条
  • [21] Combining rough set and centroid classifier for text classification
    Shi, Lei
    Zhang, Yamei
    Zhao, Jingying
    Journal of Information and Computational Science, 2010, 7 (01): : 79 - 84
  • [22] A hybrid text classification model based on rough sets and genetic algorithms
    Wang, Xiaoyue
    Hua, Zhen
    Bai, Rujiang
    PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 971 - 977
  • [23] A complementary hybrid classification algorithm based on Web text
    Xing, Lili
    Zhang, Bing
    Lu, Yuhong
    Li, Zhong
    Computer Modelling and New Technologies, 2014, 18 (12): : 258 - 263
  • [24] Variable Precision Rough Set Weight Calculation Based on Web Text Classification
    Wang Chang-long
    Qi Yan-ming
    2009 5TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-8, 2009, : 4864 - +
  • [25] An efficient text classification rule extraction method based on χ value and rough set
    Wang, Ye
    Wang, Ming-Chun
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1552 - +
  • [26] Hybrid Attribute Reduction for Classification Based on A Fuzzy Rough Set Technique
    Hu, Qinghua
    Yu, Daren
    Xie, Zongxia
    PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 195 - 204
  • [27] Method of pattern classification on line based on Rough Set and SVM algorithm
    Fan, Jinsong
    Fang, Tingjian
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2000, 13 (04): : 419 - 423
  • [28] An Algorithm of Text Categorization Based on Similar Rough Set and Fuzzy Cognitive Map
    Zhou, Xin
    Zhang, Huaxiang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2008, : 127 - 131
  • [29] Remote sensing image classification algorithm based on rough set theory
    Dong, Guang-Jun
    Zhang, Yong-Sheng
    Fan, Yong-Hong
    FUZZY INFORMATION AND ENGINEERING, PROCEEDINGS, 2007, 40 : 846 - +
  • [30] An Algorithm for Attribute Reduction Based on Classification of Condition Attributes in Rough Set
    Wan Rong
    Yan Ruixia
    2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 5534 - 5537