Rough set based hybrid algorithm for text classification

被引:47
|
作者
Miao, Duoqian [1 ]
Duan, Qiguo [1 ]
Zhang, Hongyu [1 ]
Jiao, Na [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Variable precision rough set (VPRS); k-nearest neighbor (kNN); Rocchio algorithm;
D O I
10.1016/j.eswa.2008.12.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic flue to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method Without premodelling, kNN has a high cost to classify new documents when training set is large. Rocchio algorithm is another well-known and widely used technique for text classification. One drawback of [tie Rocchio classifier is that it restricts the hypothesis space to the set of linear separable hyperplane regions. When the data does not fit its underlying assumption well, Rocchio classifier suffers. In this paper, a hybrid algorithm based on variable precision rough set is proposed to combine the strength of both kNN and Rocchio techniques and overcome their weaknesses. Art experimental evaluation of different methods is carried out oil two common text corpora, i.e., the Reuters-21578 collection and the 20-newsgroup collection. The experimental results indicate that the novel algorithm achieves significant performance improvement. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:9168 / 9174
页数:7
相关论文
共 50 条
  • [31] A New Rough Set Based Classification Rule Generation Algorithm(RGA)
    Feng, Honghai
    Chen, Yanyan
    Zou, Kaiwei
    Liu, Lijuan
    Zhu, Qiannan
    Ran, Zhuo
    Yao, Li
    Ji, Lijin
    Liu, Sai
    MODERN ADVANCES IN APPLIED INTELLIGENCE, IEA/AIE 2014, PT I, 2014, 8481 : 369 - 378
  • [32] Rough Set based Ensemble Learning Algorithm for Agricultural Data Classification
    Shi, Lei
    Duan, Qiguo
    Zhang, Juanjuan
    Xi, Lei
    Qiao, Hongbo
    Ma, Xinming
    FILOMAT, 2018, 32 (05) : 1917 - 1930
  • [33] Rough set classification rules mining based on incremental genetic algorithm
    He, Ming
    Feng, Boqin
    Ma, Zhaofeng
    Fu, Xianghua
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2004, 38 (06): : 579 - 582
  • [34] A new rough set based classification rule generation algorithm (RGI)
    Feng, Honghai
    Chen, Yanyan
    Ni, Qing
    Huang, Junhui
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), VOL 1, 2014, : 380 - 385
  • [35] A novel weighting formula and feature selection for text classification based on rough set theory
    Hu, QH
    Yu, D
    Duan, YF
    Bao, W
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 638 - 645
  • [36] A Hybrid Algorithm for Text Classification Problem
    Liu, Xiaoyong
    Fu, Hui
    PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (1B): : 8 - 11
  • [37] Rough Set Reducts Based Classification
    Ishii, Naohiro
    Bao, Yongguang
    Hoki, Yuta
    Tanaka, Hidekazu
    NEW ADVANCES IN INTELLIGENT DECISION TECHNOLOGIES, 2009, 199 : 373 - +
  • [38] Text Feature Extraction Based on Rough Set
    Cheng, Yiyuan
    Zhang, Ruiling
    Wang, Xiufeng
    Chen, Qiushuang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 310 - 314
  • [39] A Novel Hybrid Approach Based on Rough Set for Classification: An Empirical Comparative Study
    Hussein, Ahmed Saad
    Li, Tianrui
    Yohannese, Chubato Wondaferaw
    Bashir, Kamal
    JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2019, 33 (4-5) : 363 - 380
  • [40] Fuzzy-rough set based nearest neighbor clustering classification algorithm
    Wang, XY
    Yang, J
    Teng, XL
    Peng, NS
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 1, PROCEEDINGS, 2005, 3613 : 370 - 373