A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [41] SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering
    Xu, Liang
    Dou, Wensheng
    Gao, Chushu
    Wang, Jie
    Wei, Jun
    Zhong, Hua
    Huang, Tao
    2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, : 158 - 169
  • [42] A Weight-Incorporated Similarity-Based Clustering Ensemble Method
    Liu, ShiYao
    Kang, Qi
    An, Jing
    Zhou, MengChu
    2014 IEEE 11TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2014, : 719 - 724
  • [43] SOFTWARE ARCHITECTURE RECOVERY THROUGH SIMILARITY-BASED GRAPH CLUSTERING
    Zhu, Jianlin
    Huang, Jin
    Zhou, Daicui
    Yin, Zhongbao
    Zhang, Guoping
    He, Qiang
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (04) : 559 - 586
  • [44] A fuzzy similarity-based clustering optimized by particle swarm optimization
    School of Computer Science and Technology, Xidian University, Xi'an 710071, China
    Chin J Electron, 2013, 3 (461-465):
  • [45] Fuzzy Similarity-Based Hierarchical Clustering for Atmospheric Pollutants Prediction
    Camastra, F.
    Ciaramella, A.
    Son, L. H.
    Riccio, A.
    Staiano, A.
    FUZZY LOGIC AND APPLICATIONS, WILF 2018, 2019, 11291 : 123 - 133
  • [46] A new similarity-based multicriteria recommendation algorithm based on autoencoders
    Batmaz, Zeynep
    Kaleli, Cihan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (03) : 855 - 870
  • [47] Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm
    Gu, L.
    Han, Y.
    Wang, C.
    Chen, Wei
    Jiao, Jun
    Yuan, X.
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (05): : 1481 - 1490
  • [48] A New Generalized Similarity-Based Topic Distillation Algorithm
    ZHOU Hongfang1
    2. Xi’an Branch
    Wuhan University Journal of Natural Sciences, 2007, (05) : 789 - 792
  • [49] A Flexible Similarity-Based Algorithm for Tool Condition Monitoring
    Stuhr, Benjamin
    Liu, Rui
    JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2022, 144 (03):
  • [50] A Similarity-Based Learning Algorithm Using Distance Transformation
    Hu, Yuh-Jyh
    Yu, Min-Che
    Wang, Hsiang-An
    Ting, Zih-Yun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (06) : 1452 - 1464