A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [11] A new unsupervised feature selection algorithm using similarity-based feature clustering
    Zhu, Xiaoyan
    Wang, Yu
    Li, Yingbin
    Tan, Yonghui
    Wang, Guangtao
    Song, Qinbao
    [J]. COMPUTATIONAL INTELLIGENCE, 2019, 35 (01) : 2 - 22
  • [12] A novel similarity-based fuzzy clustering algorithm by integrating PCM and mountain method
    Tseng, Vincent S.
    Kao, Ching-Pin
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2007, 15 (06) : 1188 - 1196
  • [13] An Improved Web Information Summarization Method Using Sentence Similarity-Based Soft Clustering
    Tang, Jun
    Zhao, Xiaojuan
    [J]. 2009 INTERNATIONAL CONFERENCE ON FUTURE BIOMEDICAL INFORMATION ENGINEERING (FBIE 2009), 2009, : 292 - 295
  • [14] An efficient similarity-based approach for comparing XML documents
    Oliveira, Alessandreia
    Tessarolli, Gabriel
    Ghiotto, Gleiph
    Pinto, Bruno
    Campello, Fernando
    Marques, Matheus
    Oliveira, Carlos
    Rodrigues, Igor
    Kalinowski, Marcos
    Souza, Ueverton
    Murta, Leonardo
    Braganholo, Vanessa
    [J]. INFORMATION SYSTEMS, 2018, 78 : 40 - 57
  • [15] Similarity-Based Clustering For IoT Device Classification
    Dupont, Guillaume
    Leite, Cristoffer
    dos Santos, Daniel Ricardo
    Costante, Elisa
    den Hartog, Jerry
    Etalle, Sandro
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (IEEE COINS 2021), 2021, : 104 - 110
  • [16] Similarity-based Fuzzy clustering for user profiling
    Castellano, Giovanna
    Fanelli, A. Maria
    Mencar, Corrado
    Torsello, M. Alessandra
    [J]. PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 75 - 78
  • [17] A Cost Function for Similarity-Based Hierarchical Clustering
    Dasgupta, Sanjoy
    [J]. STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, : 118 - 127
  • [18] Similarity-based clustering for patterns of extreme values
    de Carvalho, Miguel
    Huser, Raphael
    Rubio, Rodrigo
    [J]. STAT, 2023, 12 (01):
  • [19] A Similarity-based Fuzzy Soft Reasoning Method
    Wang, Lu
    Xue, Binbin
    Qin, Keyun
    [J]. 2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), 2017,
  • [20] Data integration by fuzzy similarity-based hierarchical clustering
    Ciaramella, Angelo
    Nardone, Davide
    Staiano, Antonino
    [J]. BMC BIOINFORMATICS, 2020, 21 (Suppl 10)