Models of distributed data clustering in peer-to-peer environments

被引:0
|
作者
Khaled M. Hammouda
Mohamed S. Kamel
机构
[1] Desire2Learn Inc.,Department of Electrical and Computer Engineering, PAMI Group
[2] University of Waterloo,undefined
来源
关键词
Distributed data clustering; Peer-to-peer data mining;
D O I
暂无
中图分类号
学科分类号
摘要
Distributed data mining applies techniques to mine distributed data sources by avoiding the need to first collect the data into a central site. This has a significant appeal when issues of communication cost and privacy put a restriction on traditional centralized methods. Although there has been development on many fronts in distributed data mining, we are still lacking models that abstract the process by showing similarities and contrasts between the different methods. In this paper, we introduce two abstract models for distributed clustering in peer-to-peer environments with different goals. The first is the Locally optimized Distributed Clustering (LDC) model, which aims toward achieving better local clusters at each node, and is facilitated by collaboration through sharing of summarized cluster information. The second is the Globally optimized Distributed Clustering (GDC) model, which aims toward achieving one global clustering solution that is an approximation of centralized clustering. We also report on concrete realizations of the two models that show their benefits, through application in text mining. The LDC model is realized through the Collaborative P2P Clustering algorithm, while the GDC model is realized through the Hierarchically distributed P2P Clustering algorithm. In the former, we show that peer collaboration results in significant increase in local clustering quality. The process utilizes cluster summarization to exchange information between peers. In the latter, we target scalability by structuring the P2P network hierarchically and devise a distributed variant of the k-means algorithm to compute one set of clusters across the hierarchy. We demonstrate through experimental results the effectiveness of both methods and make recommendation on when to use each method.
引用
收藏
页码:303 / 329
页数:26
相关论文
共 50 条
  • [1] Models of distributed data clustering in peer-to-peer environments
    Hammouda, Khaled M.
    Kamel, Mohamed S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 38 (02) : 303 - 329
  • [2] Clustering distributed data streams in peer-to-peer environments
    Bandyopadhyay, Sanghamitra
    Giannella, Chris
    Maulik, Ujjwal
    Kargupta, Hillol
    Liu, Kun
    Datta, Souptik
    INFORMATION SCIENCES, 2006, 176 (14) : 1952 - 1985
  • [3] Uncertain Data Clustering in Distributed Peer-to-Peer Networks
    Zhou, Jin
    Chen, Long
    Chen, C. L. Philip
    Wang, Yingxu
    Li, Han-Xiong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2392 - 2406
  • [4] Improved clustering algorithm in Peer-to-Peer environments
    Tian, Ye
    Liu, Da-You
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2010, 40 (06): : 1639 - 1643
  • [5] On the characterization of peer-to-peer distributed virtual environments
    Rueda, S.
    Morillo, P.
    Orduna, J. M.
    Duato, J.
    IEEE VIRTUAL REALITY 2007, PROCEEDINGS, 2007, : 107 - +
  • [6] SDC: A distributed clustering protocol for peer-to-peer networks
    Li, Yan
    Lao, Li
    Cui, Jun-Hong
    NETWORKING 2006: NETWORKING TECHNOLOGIES, SERVICES, AND PROTOCOLS; PERFORMANCE OF COMPUTER AND COMMUNICATION NETWORKS; MOBILE AND WIRELESS COMMUNICATIONS SYSTEMS, 2006, 3976 : 1234 - 1239
  • [7] A peer-to-peer platform for simulating distributed virtual environments
    Rueda, Silvia
    Morillo, Pedro
    Orduna, Juan M.
    2007 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, VOLS 1 AND 2, 2007, : 555 - 562
  • [8] Implementation of a Distributed File Storage on Peer-to-Peer Environments
    Yang, Chao-Tung
    Chen, Hung-Yen
    Huang, Chih-Lin
    Tsaur, Shyh-Chang
    2009 10TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS, AND NETWORKS (ISPAN 2009), 2009, : 679 - +
  • [9] Distributed data mining in peer-to-peer networks
    Datta, Souptik
    Bhaduri, Kanishka
    Giannella, Chris
    Kargupta, Hillol
    Wolff, Ran
    IEEE INTERNET COMPUTING, 2006, 10 (04) : 18 - 26
  • [10] Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization
    Hammouda, Khaled M.
    Kamel, Mohamed S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) : 681 - 698