A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [31] Improving performance of similarity-based clustering by feature weight learning
    Yeung, DS
    Wang, XZ
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) : 556 - 561
  • [32] A Fuzzy Similarity-based Clustering Optimized by Particle Swarm Optimization
    Chen Donghui
    Liu Zhijing
    Wang Zonghu
    CHINESE JOURNAL OF ELECTRONICS, 2013, 22 (03): : 461 - 465
  • [33] Similarity-based clustering of multifeature objects in visual working memory
    Gaeun Son
    Sang Chul Chong
    Attention, Perception, & Psychophysics, 2023, 85 : 2242 - 2256
  • [34] Similarity-based Attention Embedding Approach for Attributed Graph Clustering
    Weng, Wei
    Li, Tong
    Liao, Jian-Chao
    Guo, Feng
    Chen, Fen
    Wei, Bo-Wen
    Journal of Network Intelligence, 2022, 7 (04): : 848 - 861
  • [35] Similarity-based clustering of multifeature objects in visual working memory
    Son, Gaeun
    Chong, Sang Chul
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2023, 85 (07) : 2242 - 2256
  • [37] Performance studies of some similarity-based fuzzy Clustering algorithms
    School of Information Technology, Indian Institute of Technology, Kharagpur-721302, West Bengal, India
    不详
    Int. J. Perform. Eng., 2006, 2 (191-200):
  • [38] Similarity-based Clustering by Left-Stochastic Matrix Factorization
    Arora, Raman
    Gupta, Maya R.
    Kapila, Amol
    Fazel, Maryam
    JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 1715 - 1746
  • [39] Similarity-based clustering of sequences using hidden Markov models
    Bicego, M
    Murino, V
    Figueiredo, MAT
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2003, 2734 : 86 - 95
  • [40] A similarity-based method for retrieving documents from the SCI/SSCI database
    Chen, Yen-Liang
    Wei, Jhong-Jhih
    Wu, Shin-Yi
    Hu, Ya-Han
    JOURNAL OF INFORMATION SCIENCE, 2006, 32 (05) : 449 - 464