A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [31] Similarity-based Attention Embedding Approach for Attributed Graph Clustering
    Weng, Wei
    Li, Tong
    Liao, Jian-Chao
    Guo, Feng
    Chen, Fen
    Wei, Bo-Wen
    [J]. Journal of Network Intelligence, 2022, 7 (04): : 848 - 861
  • [32] A Fuzzy Similarity-based Clustering Optimized by Particle Swarm Optimization
    Chen Donghui
    Liu Zhijing
    Wang Zonghu
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2013, 22 (03) : 461 - 465
  • [33] Similarity-based clustering of multifeature objects in visual working memory
    Son, Gaeun
    Chong, Sang Chul
    [J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2023, 85 (07) : 2242 - 2256
  • [34] Similarity-based Clustering by Left-Stochastic Matrix Factorization
    Arora, Raman
    Gupta, Maya R.
    Kapila, Amol
    Fazel, Maryam
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 1715 - 1746
  • [35] Similarity-based clustering by left-stochastic matrix factorization
    [J]. 1715, Microtome Publishing (14):
  • [36] Fuzzy Similarity-Based Hierarchical Clustering for Atmospheric Pollutants Prediction
    Camastra, F.
    Ciaramella, A.
    Son, L. H.
    Riccio, A.
    Staiano, A.
    [J]. FUZZY LOGIC AND APPLICATIONS, WILF 2018, 2019, 11291 : 123 - 133
  • [37] Similarity-based clustering of sequences using hidden Markov models
    Bicego, M
    Murino, V
    Figueiredo, MAT
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2003, 2734 : 86 - 95
  • [38] SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering
    Xu, Liang
    Dou, Wensheng
    Gao, Chushu
    Wang, Jie
    Wei, Jun
    Zhong, Hua
    Huang, Tao
    [J]. 2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, : 158 - 169
  • [39] A Weight-Incorporated Similarity-Based Clustering Ensemble Method
    Liu, ShiYao
    Kang, Qi
    An, Jing
    Zhou, MengChu
    [J]. 2014 IEEE 11TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2014, : 719 - 724
  • [40] A fuzzy similarity-based clustering optimized by particle swarm optimization
    School of Computer Science and Technology, Xidian University, Xi'an 710071, China
    [J]. Chin J Electron, 2013, 3 (461-465):