A Similarity Rough Set Model for Document Representation and Document Clustering

被引:2
|
作者
Nguyen Chi Thanh [1 ]
Yamada, Koichi [1 ]
Unehara, Muneyuki [1 ]
机构
[1] Nagaoka Univ Technol, Dept Management & Informat Syst Sci, 1603-1 Kamitomioka, Nagaoka, Niigata 9402188, Japan
关键词
document clustering; document representation; rough sets; text mining;
D O I
10.20965/jaciii.2011.p0125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering is a textmining technique for unsupervised document organization. It helps the users browse and navigate large sets of documents. Ho et al. proposed a Tolerance Rough Set Model (TRSM) [1] for improving the vector space model that represents documents by vectors of terms and applied it to document clustering. In this paper we analyze their model to propose a new model for efficient clustering of documents. We introduce Similarity Rough Set Model (SRSM) as another model for presenting documents in document clustering. The model is evaluated by experiments on test collections. The experiment results show that the SRSM document clustering method outperforms the one with TRSM and the results of SRSM are less affected by the value of parameter than TRSM.
引用
收藏
页码:125 / 133
页数:9
相关论文
共 50 条
  • [1] Nonhierarchical document clustering based on a tolerance rough set model
    Ho, TB
    Nguyen, NB
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2002, 17 (02) : 199 - 212
  • [2] Hierarchical Document Clustering Based on Tolerance Rough Set Model
    Kawasaki, Saori
    Nguyen, Ngoc Binh
    Ho, Tu Bao
    [J]. LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 458 - 463
  • [3] Document Clustering Based on Fuzzy Rough Set
    Zhou Peng
    Li Zhishu
    Cheng Yang
    Huang Zhiguo
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS, 2009, : 701 - +
  • [4] Rough set theory for document clustering: A review
    Vidhya, K. A.
    Geetha, T. V.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (03) : 2165 - 2185
  • [5] Cross-Lingual Document Representation and Semantic Similarity Measure: A Fuzzy Set and Rough Set Based Approach
    Huang, Hsun-Hui
    Kuo, Yau-Hwang
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (06) : 1098 - 1111
  • [6] Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
    Qiu, Dong
    Jiang, Haihuan
    Yan, Ruiteng
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 1218 - 1226
  • [7] Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
    Dong Qiu
    Haihuan Jiang
    Ruiteng Yan
    [J]. International Journal of Computational Intelligence Systems, 2020, 13 : 1218 - 1226
  • [8] An Unified Approach for Multimedia Document Representation and Document Similarity
    Pushpalatha, K.
    Ananthanarayana, V. S.
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 249 - 256
  • [9] Weight Learning for Document Tolerance Rough Set Model
    Swieboda, Wojciech
    Meina, Michal
    Hung Son Nguyen
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY: 8TH INTERNATIONAL CONFERENCE, 2013, 8171 : 385 - 396
  • [10] Semantically Enriching Text Representation Model for Document Clustering
    Kim, Han-joon
    Hong, Kee-joo
    Chang, Jae Young
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 922 - 925