Rough set theory for document clustering: A review

被引:17
|
作者
Vidhya, K. A. [1 ]
Geetha, T. V. [1 ]
机构
[1] Anna Univ, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
Rough set theory; document clustering; machine learning; approximation space; OUTLIER DETECTION; CLASSIFICATION; ALGORITHM; MODEL;
D O I
10.3233/JIFS-162006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Rough set theory is a mathematical framework that can be visualized as a soft computing tool dealing with the vagueness and uncertainty of data and is applied to pattern recognition, data mining, and knowledge discovery. Document clustering is another area of research with values which are a bag of words that describe contents within clusters. This work analyzes how rough set theory is used for document clustering to fix issues that clustering methods manage. In this survey, an exhaustive literature reviewof the concept of rough sets, as well as howthe lower and upper approximation of a set can be used for document clustering, has been presented. Rough set clusters are shown to be useful for representing real-time applications such as biomedical inferences, network data handling, and citation analysis. The survey is done in phases, showing how machine learning algorithms have been incorporated for document clustering using rough set theory, as well as how rough set theory has been extended to adapt to document clustering with feature selection techniques and feature/dimensionality reduction and, finally, ending with a view of assorted clustering tasks where rough set theory is applied. The classification of rough set theory for document clustering is depicted and its applications presented in this paper. The rough set theory works with resolving ambiguity and uncertainty in data. To the best of our knowledge, a rough set clustering survey has not been done earlier in the literature reviewed and the survey ends with a critical analysis of rough set theory in each application of clustering.
引用
收藏
页码:2165 / 2185
页数:21
相关论文
共 50 条
  • [1] Document Clustering Based on Fuzzy Rough Set
    Zhou Peng
    Li Zhishu
    Cheng Yang
    Huang Zhiguo
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS, 2009, : 701 - +
  • [2] A Similarity Rough Set Model for Document Representation and Document Clustering
    Nguyen Chi Thanh
    Yamada, Koichi
    Unehara, Muneyuki
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2011, 15 (02) : 125 - 133
  • [3] Discretization using clustering and rough set theory
    Singh, Girish Kumar
    Minz, Sonajharia
    [J]. ICCTA 2007: INTERNATIONAL CONFERENCE ON COMPUTING: THEORY AND APPLICATIONS, PROCEEDINGS, 2007, : 330 - +
  • [4] ROUGH SET THEORY FOR SELECTING CLUSTERING ATTRIBUTE
    Herawan, Tutut
    Dens, Mustafa Mat
    [J]. POWER CONTROL AND OPTIMIZATION, PROCEEDINGS, 2009, 1159 : 331 - 338
  • [5] Autonomous Clustering Using Rough Set Theory
    Bean, Charlotte
    Kambhampati, Chandra
    [J]. INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2008, 5 (01) : 90 - 102
  • [6] Autonomous Clustering Using Rough Set Theory
    Charlotte Bean
    Chandra Kambhampati
    [J]. Machine Intelligence Research, 2008, (01) : 90 - 102
  • [7] On clustering validity measures and the rough set theory
    Arco, Leticia
    Bello, Rafael
    Garcia, Maria M.
    [J]. MICAI 2006: FIFTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, : 168 - +
  • [8] Nonhierarchical document clustering based on a tolerance rough set model
    Ho, TB
    Nguyen, NB
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2002, 17 (02) : 199 - 212
  • [9] Hierarchical Document Clustering Based on Tolerance Rough Set Model
    Kawasaki, Saori
    Nguyen, Ngoc Binh
    Ho, Tu Bao
    [J]. LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 458 - 463
  • [10] Review on Application of Rough Set Theory
    Yan, Hua
    [J]. MANUFACTURING PROCESS AND EQUIPMENT, PTS 1-4, 2013, 694-697 : 2905 - 2909