Models of distributed data clustering in peer-to-peer environments

被引:0
|
作者
Khaled M. Hammouda
Mohamed S. Kamel
机构
[1] Desire2Learn Inc.,Department of Electrical and Computer Engineering, PAMI Group
[2] University of Waterloo,undefined
来源
关键词
Distributed data clustering; Peer-to-peer data mining;
D O I
暂无
中图分类号
学科分类号
摘要
Distributed data mining applies techniques to mine distributed data sources by avoiding the need to first collect the data into a central site. This has a significant appeal when issues of communication cost and privacy put a restriction on traditional centralized methods. Although there has been development on many fronts in distributed data mining, we are still lacking models that abstract the process by showing similarities and contrasts between the different methods. In this paper, we introduce two abstract models for distributed clustering in peer-to-peer environments with different goals. The first is the Locally optimized Distributed Clustering (LDC) model, which aims toward achieving better local clusters at each node, and is facilitated by collaboration through sharing of summarized cluster information. The second is the Globally optimized Distributed Clustering (GDC) model, which aims toward achieving one global clustering solution that is an approximation of centralized clustering. We also report on concrete realizations of the two models that show their benefits, through application in text mining. The LDC model is realized through the Collaborative P2P Clustering algorithm, while the GDC model is realized through the Hierarchically distributed P2P Clustering algorithm. In the former, we show that peer collaboration results in significant increase in local clustering quality. The process utilizes cluster summarization to exchange information between peers. In the latter, we target scalability by structuring the P2P network hierarchically and devise a distributed variant of the k-means algorithm to compute one set of clusters across the hierarchy. We demonstrate through experimental results the effectiveness of both methods and make recommendation on when to use each method.
引用
收藏
页码:303 / 329
页数:26
相关论文
共 50 条
  • [31] A peer-to-peer formula interest expression propagation model for distributed virtual environments
    Bartlett, R
    NINTH IEEE INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL-TIME APPLICATIONS, PROCEEDINGS, 2005, : 113 - 119
  • [32] Distributed Classification in Peer-to-Peer Networks
    Luo, Ping
    Xiong, Hui
    Lue, Kevin
    Shi, Zhongzhi
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 968 - +
  • [33] DSM: A DATA SERVICE MIDDLEWARE FOR SHARING DATA IN PEER-TO-PEER COMPUTING ENVIRONMENTS
    Gannouni, Sofien
    Beraka, Mutaz
    Mathkour, Hassan
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (11): : 7819 - 7828
  • [34] A distributed peer-to-peer grid scheduler
    Liu, Cong
    Baskiyar, Sanjeev
    Wang, Chengjun
    PROCEEDINGS OF THE 18TH IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND SYSTEMS, 2006, : 601 - +
  • [35] Providing full awareness to distributed virtual environments based on peer-to-peer architectures
    Morillo, P.
    Moncho, W.
    Orduna, J. M.
    Duato, J.
    ADVANCES IN COMPUTER GRAPHICS, PROCEEDINGS, 2006, 4035 : 336 - 347
  • [36] A decentralized gossip based approach for data clustering in peer-to-peer networks
    Azimi, Rasool
    Sajedi, Hedieh
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 119 : 64 - 80
  • [37] Distributed monitoring of Peer-to-Peer systems
    Abiteboul, Serge
    Marinoiu, Bogdan
    Bourhis, Pierre
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1572 - +
  • [38] Distributed peer-to-peer control in harness
    Engelmann, C
    Scott, SL
    Geist, GA
    COMPUTATIONAL SCIENCE-ICCS 2002, PT II, PROCEEDINGS, 2002, 2330 : 720 - 728
  • [39] Distributed reasoning in a peer-to-peer setting
    Adjiman, P
    Chatalic, P
    Goasdoué, F
    Rousset, MC
    Simon, L
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 945 - 946
  • [40] Distributed authentication for peer-to-peer networks
    Gokhale, S
    Dasgupta, P
    2003 SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS, 2003, : 347 - 353