A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [21] Data integration by fuzzy similarity-based hierarchical clustering
    Angelo Ciaramella
    Davide Nardone
    Antonino Staiano
    [J]. BMC Bioinformatics, 21
  • [22] Spectral analysis of text collection for similarity-based clustering
    Li, WY
    Ng, WK
    Lim, EP
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 389 - 393
  • [23] Spectral analysis of text collection for similarity-based clustering
    Li, WY
    Ng, WK
    Lim, EP
    [J]. 20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 833 - 833
  • [24] Predicting user preferences via similarity-based clustering
    Qin, Mian
    Buffett, Scott
    Fleming, Michael W.
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2008, 5032 : 222 - +
  • [25] Recursive Similarity-Based Algorithm for Deep Learning
    Maszczyk, Tomasz
    Duch, Wlodzislaw
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2012, PT III, 2012, 7665 : 390 - 397
  • [26] Neighborhood Similarity-Based Color Transfer Algorithm
    Li, Yanhao
    Li, Zhijiang
    Cao, Liqin
    [J]. ADVANCED GRAPHIC COMMUNICATIONS, PACKAGING TECHNOLOGY AND MATERIALS, 2016, 369 : 127 - 132
  • [27] Clustering XML documents based on structural similarity
    Xing, Guangming
    Xia, Zhonghang
    Guo, Jinhua
    [J]. ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 905 - +
  • [28] A Similarity-Based Hierarchical Clustering Method for Manufacturing Process Models
    Ahn, Hyun
    Chang, Tai-Woo
    [J]. SUSTAINABILITY, 2019, 11 (09)
  • [29] Improving performance of similarity-based clustering by feature weight learning
    Yeung, DS
    Wang, XZ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) : 556 - 561
  • [30] Similarity-based clustering of multifeature objects in visual working memory
    Gaeun Son
    Sang Chul Chong
    [J]. Attention, Perception, & Psychophysics, 2023, 85 : 2242 - 2256