Nonhierarchical document clustering based on a tolerance rough set model

被引:59
|
作者
Ho, TB [1 ]
Nguyen, NB
机构
[1] Japan Adv Inst Sci & Technol, Tatsunokuchi, Ishikawa 9231292, Japan
[2] Hanoi Univ Technol, Hanoi, Vietnam
关键词
Algorithms - Data mining - Database systems - Information retrieval - Mathematical models - Rough set theory - Semantics;
D O I
10.1002/int.10016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering, the grouping of documents into several clusters, has been recognized as a means for improving efficiency and effectiveness of information retrieval and text mining. With the growing importance of electronic media for storing and exchanging large textual databases, document clustering becomes more significant. Hierarchical document clustering methods, having a dominant role in document clustering, seem inadequate for large document databases as the time and space requirements are typically of order O(N-3) and O(N-2), where N is the number of index terms in a database, In addition, when each document is characterized by only several terms or keywords, clustering algorithms often produce poor results as most similarity measures yield many zero values. In this article we introduce a nonhierarchical document clustering algorithm based on a proposed tolerance rough set model (TRSM). This algorithm contributes two considerable features: (1) it can be applied to large document databases, as the time and space requirements are of order O(NlogN) and O(N), respectively; and (2) it can be well adapted to documents characterized by a few terms due to the TRSM's ability of semantic calculation. The algorithm has been evaluated and validated by experiments on test collections. (C) 2002 John Wiley Sons, Inc.
引用
收藏
页码:199 / 212
页数:14
相关论文
共 50 条
  • [21] Clustering Based on Rough Set Knowledge Discovery
    Shan, Chen
    [J]. FUTURE COMPUTER, COMMUNICATION, CONTROL AND AUTOMATION, 2011, 119 : 561 - 565
  • [22] Student Management Based on Rough Set and Clustering
    Ren, Xueli
    Dai, Yubiao
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON SENSOR NETWORK AND COMPUTER ENGINEERING, 2016, 68 : 501 - 505
  • [23] Clustering of Web Learners Based on Rough Set
    LIU Shuai-dong 1
    2.National Engineering Research Center for Multimedia Software
    [J]. Wuhan University Journal of Natural Sciences, 2004, (05) : 542 - 546
  • [24] A rough set-based fuzzy clustering
    Zhao, YQ
    Zhou, XZ
    Tang, GZ
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 401 - 409
  • [25] On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering
    Wu, Yonghui
    Ding, Yuxin
    Wang, Xiaolong
    Xu, Jun
    [J]. JOURNAL OF COMPUTERS, 2010, 5 (04) : 549 - 556
  • [26] Rough Set Based on Valued Tolerance Relation
    Luo, Jun-Fang
    Qin, Ke-Yun
    [J]. INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND AUTOMATION (ICCEA 2014), 2014, : 320 - 323
  • [27] Evaluation Model for Grid Safe Production Based on Fuzzy Clustering and Rough Set
    Zhang Liying
    Qi Jianxun
    [J]. WMSO: 2008 INTERNATIONAL WORKSHOP ON MODELLING, SIMULATION AND OPTIMIZATION, PROCEEDINGS, 2009, : 196 - 199
  • [28] Web document classification based on extended rough set
    Yi, GX
    Hu, HP
    Lu, ZD
    [J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 916 - 918
  • [29] Three-way decisions model based on tolerance rough fuzzy set
    Junhai Zhai
    Yao Zhang
    Hongyu Zhu
    [J]. International Journal of Machine Learning and Cybernetics, 2017, 8 : 35 - 43
  • [30] Extended Rough Set Model Based on Prior Probability and Valued Tolerance Relation
    Hao-Dong Zhu and Hong-Chan Li School of Computer and Communication Engineering
    [J]. Journal of Electronic Science and Technology, 2011, 9 (01) : 46 - 50