Parallel Clustering Based on Partitions of Local Minimal-Spanning-Trees

被引:0
|
作者
Tsui, Shiau-Rung [1 ]
Wang, Wei-Jen [1 ,3 ]
Chen, Shi-Shan [1 ]
Chen, Lee Shu-Teng [2 ]
Wang, Chilung [2 ]
机构
[1] Natl Cent Univ, Dept Comp Sci & Informat Engn, Jhongli, Taiwan
[2] Ind Technol Res Inst, Clouding Comp Ctr Mobile Applicat, Hsinchu, Taiwan
[3] Natl Cent Univ, Software Res Ctr, Taoyuan, Taiwan
关键词
clustering; parallel computing; graph-based clustering;
D O I
10.1109/PAAP.2012.25
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many traditional clustering algorithms have the scalability problem while dealing with large data sets. One common strategy to handle the problem is to parallelize the algorithms and execute them along with the input data on high-performance computers. However, many graph-based clustering algorithms are hard to be parallelized since they need to calculate the similarity of all-pairs of all data nodes. In this paper, we propose a new parallel clustering algorithm, called the Para-CPLM (Parallel Clustering based on Partitions of Local Minimal-spanning-trees), which is based on three strategies - graph-based clustering, granular computing, and partition-and-merge. The Para-CPLM partitions the data domain into several regions for parallel execution, and then establishes a local minimal spanning tree in each region. After being established, the Para-CPLM combines those local minimal spanning trees and applies a method, namely the GBC method, to determine the best number of clusters. After the first phase of clustering, it repeatedly finds better pairs (edges) of the inter-clusters to reform the merged tree structure, such that the tree becomes closer to a global minimal spanning tree. Consequently, it uses the GBC method again to find the best number of clusters. From our experimental results, the Para-CPLM achieves significantly shorter execution time and better scalability while compared with the sequential GBC method. In addition, the clustering results are almost identical to those produced by the sequential GBC method.
引用
收藏
页码:111 / 118
页数:8
相关论文
共 50 条
  • [41] Topology of correlation-based minimal spanning trees in real and model markets
    Bonanno, G
    Caldarelli, G
    Lillo, F
    Mantegna, RN
    PHYSICAL REVIEW E, 2003, 68 (04)
  • [42] Clustering gene expression data with memetic algorithms based on minimum spanning trees
    Speer, N
    Merz, P
    Spieth, C
    Zell, A
    CEC: 2003 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-4, PROCEEDINGS, 2003, : 1848 - 1855
  • [43] Effective enhancement of isolation Forest method based on Minimal Spanning tree clustering
    Galka, Lukasz
    Karczmarek, Pawel
    Tokovarov, Mikhail
    INFORMATION SCIENCES, 2023, 628 : 320 - 338
  • [44] DISTRIBUTED ALGORITHM FOR CONSTRUCTING MINIMAL SPANNING TREES.
    Dalal, Yogen K.
    IEEE Transactions on Software Engineering, 1987, SE-13 (03) : 398 - 405
  • [45] A chinese document layout analysis method based on minimal spanning tree clustering
    Tian, XD
    Zhang, C
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 3183 - 3187
  • [46] Minimal spanning trees for graphs with random edge lengths
    Steele, JM
    MATHEMATICS AND COMPUTER SCIENCE II: ALGORITHMS, TREES, COMBINATORICS AND PROBABILITIES, 2002, : 223 - 245
  • [47] Minimal spanning trees at the percolation threshold: A numerical calculation
    Sweeney, Sean M.
    Middleton, A. Alan
    PHYSICAL REVIEW E, 2013, 88 (03)
  • [49] Depth functions and mutidimensional medians on minimal spanning trees
    Yang, Mengta
    Modarres, Reza
    Guo, Lingzhe
    JOURNAL OF APPLIED STATISTICS, 2020, 47 (02) : 323 - 336
  • [50] Clustering with Minimum Spanning Trees: How Good Can It Be?
    Gagolewski, Marek
    Cena, Anna
    Bartoszuk, Maciej
    Brzozowski, Lukasz
    JOURNAL OF CLASSIFICATION, 2025, 42 (01) : 90 - 112