A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [1] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [2] Subspace Similarity-based Algorithm for Combine Multiple Clustering
    Xu, Sen
    Li, Xianfeng
    Chen, Rong
    Wu, Shuang
    Ni, Jun
    [J]. 2013 SEVENTH INTERNATIONAL CONFERENCE ON INTERNET COMPUTING FOR ENGINEERING AND SCIENCE (ICICSE 2013), 2013, : 69 - 76
  • [3] An efficient similarity-based validity index for kernel clustering algorithm
    Pu, Yun-Wei
    Zhu, Ming
    Jin, Wei-Dong
    Hu, Lai-Zhao
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1044 - 1049
  • [4] A clustering algorithm for short documents based on concept similarity
    Peng, Jing
    Yang, Dong-qing
    Wang, Jian-wei
    Wu, Meng-qing
    Wang, Jun-gang
    [J]. 2007 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 42 - 45
  • [5] A word-based soft clustering algorithm for documents
    Lin, KI
    Kondadadi, R
    [J]. COMPUTERS AND THEIR APPLICATIONS, 2001, : 391 - 394
  • [6] An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining
    Miloudi, Salim
    Wang, Yulin
    Ding, Wenjia
    [J]. ENTROPY, 2021, 23 (05)
  • [7] Similarity-based chemical clustering techniques
    Gute, BD
    Basak, SC
    Mills, D
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 229 : U789 - U789
  • [8] Semantic Similarity-Based Clustering of Web Documents Using Fuzzy C-Means
    Avanija, J.
    Ramar, K.
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2015, 14 (03)
  • [9] A similarity-based robust clustering method
    Yang, MS
    Wu, KL
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (04) : 434 - 448
  • [10] Ranking Documents using Similarity-based PageRanks
    Hatakenaka, Shota
    Miura, Takao
    [J]. 2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 19 - 24