Nonhierarchical document clustering based on a tolerance rough set model

被引:59
|
作者
Ho, TB [1 ]
Nguyen, NB
机构
[1] Japan Adv Inst Sci & Technol, Tatsunokuchi, Ishikawa 9231292, Japan
[2] Hanoi Univ Technol, Hanoi, Vietnam
关键词
Algorithms - Data mining - Database systems - Information retrieval - Mathematical models - Rough set theory - Semantics;
D O I
10.1002/int.10016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering, the grouping of documents into several clusters, has been recognized as a means for improving efficiency and effectiveness of information retrieval and text mining. With the growing importance of electronic media for storing and exchanging large textual databases, document clustering becomes more significant. Hierarchical document clustering methods, having a dominant role in document clustering, seem inadequate for large document databases as the time and space requirements are typically of order O(N-3) and O(N-2), where N is the number of index terms in a database, In addition, when each document is characterized by only several terms or keywords, clustering algorithms often produce poor results as most similarity measures yield many zero values. In this article we introduce a nonhierarchical document clustering algorithm based on a proposed tolerance rough set model (TRSM). This algorithm contributes two considerable features: (1) it can be applied to large document databases, as the time and space requirements are of order O(NlogN) and O(N), respectively; and (2) it can be well adapted to documents characterized by a few terms due to the TRSM's ability of semantic calculation. The algorithm has been evaluated and validated by experiments on test collections. (C) 2002 John Wiley Sons, Inc.
引用
收藏
页码:199 / 212
页数:14
相关论文
共 50 条
  • [1] Hierarchical Document Clustering Based on Tolerance Rough Set Model
    Kawasaki, Saori
    Nguyen, Ngoc Binh
    Ho, Tu Bao
    [J]. LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 458 - 463
  • [2] A Similarity Rough Set Model for Document Representation and Document Clustering
    Nguyen Chi Thanh
    Yamada, Koichi
    Unehara, Muneyuki
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2011, 15 (02) : 125 - 133
  • [3] Document Clustering Based on Fuzzy Rough Set
    Zhou Peng
    Li Zhishu
    Cheng Yang
    Huang Zhiguo
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS, 2009, : 701 - +
  • [4] Weight Learning for Document Tolerance Rough Set Model
    Swieboda, Wojciech
    Meina, Michal
    Hung Son Nguyen
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY: 8TH INTERNATIONAL CONFERENCE, 2013, 8171 : 385 - 396
  • [5] Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
    Qiu, Dong
    Jiang, Haihuan
    Yan, Ruiteng
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 1218 - 1226
  • [6] Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
    Dong Qiu
    Haihuan Jiang
    Ruiteng Yan
    [J]. International Journal of Computational Intelligence Systems, 2020, 13 : 1218 - 1226
  • [7] Fast Single-Link Clustering Method Based on Tolerance Rough Set Model
    Patra, Bidyut Kr
    Nandi, Sukumar
    [J]. ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 : 414 - 422
  • [8] Rough set theory for document clustering: A review
    Vidhya, K. A.
    Geetha, T. V.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (03) : 2165 - 2185
  • [9] A general rough set model based on tolerance
    Xu, BZ
    Hu, XG
    Wang, H
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOL 1, 2004, : 770 - 774
  • [10] RETRACTED ARTICLE: Hybrid tolerance rough fuzzy set with improved monkey search algorithm based document clustering
    Altameem T.
    Amoon M.
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (02) : 1793 - 1793