Models of distributed data clustering in peer-to-peer environments

被引:0
|
作者
Khaled M. Hammouda
Mohamed S. Kamel
机构
[1] Desire2Learn Inc.,Department of Electrical and Computer Engineering, PAMI Group
[2] University of Waterloo,undefined
来源
关键词
Distributed data clustering; Peer-to-peer data mining;
D O I
暂无
中图分类号
学科分类号
摘要
Distributed data mining applies techniques to mine distributed data sources by avoiding the need to first collect the data into a central site. This has a significant appeal when issues of communication cost and privacy put a restriction on traditional centralized methods. Although there has been development on many fronts in distributed data mining, we are still lacking models that abstract the process by showing similarities and contrasts between the different methods. In this paper, we introduce two abstract models for distributed clustering in peer-to-peer environments with different goals. The first is the Locally optimized Distributed Clustering (LDC) model, which aims toward achieving better local clusters at each node, and is facilitated by collaboration through sharing of summarized cluster information. The second is the Globally optimized Distributed Clustering (GDC) model, which aims toward achieving one global clustering solution that is an approximation of centralized clustering. We also report on concrete realizations of the two models that show their benefits, through application in text mining. The LDC model is realized through the Collaborative P2P Clustering algorithm, while the GDC model is realized through the Hierarchically distributed P2P Clustering algorithm. In the former, we show that peer collaboration results in significant increase in local clustering quality. The process utilizes cluster summarization to exchange information between peers. In the latter, we target scalability by structuring the P2P network hierarchically and devise a distributed variant of the k-means algorithm to compute one set of clusters across the hierarchy. We demonstrate through experimental results the effectiveness of both methods and make recommendation on when to use each method.
引用
收藏
页码:303 / 329
页数:26
相关论文
共 50 条
  • [41] Network Attack Detection Based on Peer-to-Peer Clustering of SNMP Data
    Cerroni, Walter
    Monti, Gabriele
    Moro, Gianluca
    Ramilli, Marco
    QUALITY OF SERVICE IN HETEROGENEOUS NETWORKS, 2009, 22 : 417 - 430
  • [42] Nearest neighbor queries with peer-to-peer data sharing in mobile environments
    Ku, Wei-Shinn
    Zimmermann, Roger
    PERVASIVE AND MOBILE COMPUTING, 2008, 4 (05) : 775 - 788
  • [43] A Peer-to-Peer Architecture for Distributed Data Monetization in Fog Computing Scenarios
    de la Vega, Francisco
    Soriano, Javier
    Jimenez, Miguel
    Lizcano, David
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2018,
  • [44] Peer-to-peer data management
    Garcia-Molina, H
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 503 - 503
  • [45] Stream my Models: Reactive Peer-to-Peer Distributed Models@run.time
    Hartmann, Thomas
    Moawad, Assaad
    Fouquet, Francois
    Nain, Gregory
    Klein, Jacques
    Le Traon, Yves
    2015 ACM/IEEE 18TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS (MODELS), 2015, : 80 - 89
  • [46] Clustering in peer-to-peer file sharing workloads
    Le Fessant, F
    Handurukande, S
    Kermarrec, AM
    Massoulié, L
    PEER-TO-PEER SYSTEMS III, 2004, 3279 : 217 - 226
  • [47] An Optimized Distributed Clustering Algorithm in Advanced 3-layer Peer-to-Peer Network
    Feng, Zhiyi
    Liu, Zhijing
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 405 - 409
  • [48] HP2PC: Scalable Hierarchically-Distributed Peer-to-Peer Clustering
    Hammouda, Khaled M.
    Karnel, Mohamed S.
    PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 485 - 490
  • [49] DynamicTrust: The trust development in peer-to-peer environments
    Wang, Yan
    Varadharajan, Vijay
    IEEE INTERNATIONAL CONFERENCE ON SENSOR NETWORKS, UBIQUITOUS, AND TRUSTWORTHY COMPUTING, VOL 1, PROCEEDINGS, 2006, : 302 - +
  • [50] Distributed Peer-to-Peer Cooperative Partitional-Divisive Clustering for Gene Expression Datasets
    Kashef, R.
    Kamel, M. S.
    2008 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2008, : 237 - 244