Rough set based hybrid algorithm for text classification

被引:47
|
作者
Miao, Duoqian [1 ]
Duan, Qiguo [1 ]
Zhang, Hongyu [1 ]
Jiao, Na [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Variable precision rough set (VPRS); k-nearest neighbor (kNN); Rocchio algorithm;
D O I
10.1016/j.eswa.2008.12.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic flue to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method Without premodelling, kNN has a high cost to classify new documents when training set is large. Rocchio algorithm is another well-known and widely used technique for text classification. One drawback of [tie Rocchio classifier is that it restricts the hypothesis space to the set of linear separable hyperplane regions. When the data does not fit its underlying assumption well, Rocchio classifier suffers. In this paper, a hybrid algorithm based on variable precision rough set is proposed to combine the strength of both kNN and Rocchio techniques and overcome their weaknesses. Art experimental evaluation of different methods is carried out oil two common text corpora, i.e., the Reuters-21578 collection and the 20-newsgroup collection. The experimental results indicate that the novel algorithm achieves significant performance improvement. (C) 2008 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:9168 / 9174
页数:7
相关论文
共 50 条
  • [1] Rough Set Based Approach to Text Classification
    Zhang, Libiao
    Li, Yuefeng
    Sun, Chao
    Nadee, Wanvimol
    2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY - WORKSHOPS (WI-IAT), VOL 3, 2013, : 245 - 252
  • [2] The naive Bayes text classification algorithm based on rough set in the cloud platform
    Dai, Yugang
    Sun, Haosheng
    Journal of Chemical and Pharmaceutical Research, 2014, 6 (07) : 1636 - 1643
  • [3] Partition for the rough set-based text classification
    Bao, YG
    Asai, D
    Du, XY
    Ishii, N
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 181 - 188
  • [4] Rough set and ensemble learning based semi-supervised algorithm for text classification
    Shi, Lei
    Ma, Xinming
    Xi, Lei
    Duan, Qiguo
    Zhao, Jingying
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 6300 - 6306
  • [5] A rough set-based approach to text classification
    Chouchoulas, A
    Shen, Q
    NEW DIRECTIONS IN ROUGH SETS, DATA MINING, AND GRANULAR-SOFT COMPUTING, 1999, 1711 : 118 - 127
  • [6] Automatic text classification based on rough set and improved quick-reduce algorithm
    Jiang, MH
    Deng, BX
    Sheng, XW
    Tang, XF
    Ruan, QQ
    Yuan, BZ
    2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 2712 - 2715
  • [7] A rough set based hybrid approach for classification
    Hussein, Ahmed Saad
    Li, Tianrui
    Jaber, Noora Sabah
    Yohannese, Chubato Wondaferaw
    DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 683 - 690
  • [8] A kind of hybrid classification algorithm based on rough set and support vector machine
    Wang, LS
    Xu, YT
    Zhao, LS
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 1676 - 1679
  • [9] Rule generation based on rough set theory for text classification
    Bi, YX
    Anderson, T
    McClean, S
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XVII, 2001, : 157 - 170
  • [10] An effective rough set-based method for text classification
    Bao, YG
    Asai, D
    Du, XY
    Yamada, K
    Ishii, N
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 545 - 552