Collective, Hierarchical Clustering from distributed, heterogeneous data

被引:14
|
作者
Johnson, EL [1 ]
Kargupta, H [1 ]
机构
[1] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
来源
关键词
D O I
10.1007/3-540-46502-2_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the Collective Hierarchical Clustering (CHC) algorithm for analyzing distributed, heterogeneous data, This algorithm first generates local cluster models and then combines them to generate the global cluster model of the data. The proposed algorithm runs in O(/S/n(2)) time, with a O(/S/n) space requirement and O(n) communication requirement, where n is the number of elements in the data set and ISI is the number of data sites. This approach shows significant improvement over naive methods with O(n(2)) communication costs in the case that the entire distance matrix is transmitted and O(nm) communication costs to centralize the data, where m is the total number of features. A specific implementation based on the single link clustering and results comparing its performance with that of a centralized clustering algorithm are presented. An analysis of the algorithm complexity, in terms of overall computation time and communication requirements, is presented.
引用
收藏
页码:221 / 244
页数:24
相关论文
共 50 条
  • [1] Collective Principal Component Analysis from Distributed, Heterogeneous Data
    Kargupta, Hillol
    Huang, Weiyun
    Sivakumar, Krishnamoorthy
    Park, Byung-Hoon
    Wang, Shuren
    [J]. LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 452 - 457
  • [2] Collective mining of Bayesian networks from distributed heterogeneous data
    R. Chen
    K. Sivakumar
    H. Kargupta
    [J]. Knowledge and Information Systems, 2004, 6 (2) : 164 - 187
  • [3] Collective mining of Bayesian networks from distributed heterogeneous data
    Chen, R
    Sivakumar, K
    Kargupta, H
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (02) : 164 - 187
  • [4] Collective Mining of Bayesian Networks from Distributed Heterogeneous Data
    R. Chen
    K. Sivakumar
    H. Kargupta
    [J]. Knowledge and Information Systems, 2004, 6 : 164 - 187
  • [5] Heterogeneous distributed clustering by the fuzzy membership and hierarchical structure
    Huang, Jih-Jeng
    [J]. JOURNAL OF INDUSTRIAL AND PRODUCTION ENGINEERING, 2018, 35 (03) : 189 - 198
  • [6] Collective of algorithms with weights for clustering heterogeneous data.
    Berikov, Vladimir B.
    [J]. VESTNIK TOMSKOGO GOSUDARSTVENNOGO UNIVERSITETA-UPRAVLENIE VYCHISLITELNAJA TEHNIKA I INFORMATIKA-TOMSK STATE UNIVERSITY JOURNAL OF CONTROL AND COMPUTER SCIENCE, 2013, 23 (02): : 22 - 31
  • [7] Clustering on hierarchical heterogeneous data with prior pairwise relationships
    Wei Han
    Sanguo Zhang
    Hailong Gao
    Deliang Bu
    [J]. BMC Bioinformatics, 25
  • [8] Clustering on hierarchical heterogeneous data with prior pairwise relationships
    Han, Wei
    Zhang, Sanguo
    Gao, Hailong
    Bu, Deliang
    [J]. BMC BIOINFORMATICS, 2024, 25 (01)
  • [9] Heterogeneous Distributed Big Data Clustering on Sparse Grids
    Pfander, David
    Daiss, Gregor
    Pflueger, Dirk
    [J]. ALGORITHMS, 2019, 12 (03)
  • [10] Distributed information-based clustering of heterogeneous sensor data
    Chen, Jia
    Schizas, Ioannis D.
    [J]. SIGNAL PROCESSING, 2016, 126 : 35 - 51