Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization

被引:27
|
作者
Hammouda, Khaled M. [1 ]
Kamel, Mohamed S. [2 ]
机构
[1] Desire2Learn Inc, Kitchener, ON N2G 1B9, Canada
[2] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
Distributed data mining; distributed document clustering; hierarchical peer-to-peer networks; CATEGORIZATION;
D O I
10.1109/TKDE.2008.189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In distributed data mining, adopting a flat node distribution model can affect scalability. To address the problem of modularity, flexibility, and scalability, we propose a Hierarchically distributed Peer-to-Peer (HP2PC) architecture and clustering algorithm. The architecture is based on a multilayer overlay network of peer neighborhoods. Supernodes, which act as representatives of neighborhoods, are recursively grouped to form higher level neighborhoods. Within a certain level of the hierarchy, peers cooperate within their respective neighborhoods to perform P2P clustering. Using this model, we can partition the clustering problem in a modular way across neighborhoods, solve each part individually using a distributed K-means variant, then successively combine clusterings up the hierarchy where increasingly more global solutions are computed. In addition, for document clustering applications, we summarize the distributed document clusters using a distributed keyphrase extraction algorithm, thus providing interpretation of the clusters. Results show decent speedup, reaching 165 times faster than centralized clustering for a 250-node simulated network, with comparable clustering quality to the centralized approach. We also provide comparison to the P2P K-means algorithm and show that HP2PC accuracy is better for typical hierarchy heights. Results for distributed cluster summarization match those of their centralized counterparts with up to 88 percent accuracy.
引用
收藏
页码:681 / 698
页数:18
相关论文
共 50 条
  • [1] HP2PC: Scalable Hierarchically-Distributed Peer-to-Peer Clustering
    Hammouda, Khaled M.
    Karnel, Mohamed S.
    [J]. PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 485 - 490
  • [2] Hierarchically distributed Peer-to-Peer architecture for Computational Grid
    Gomathi, S.
    Manimegalai, D.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2013,
  • [3] Models of distributed data clustering in peer-to-peer environments
    Khaled M. Hammouda
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2014, 38 : 303 - 329
  • [4] Models of distributed data clustering in peer-to-peer environments
    Hammouda, Khaled M.
    Kamel, Mohamed S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 38 (02) : 303 - 329
  • [5] SDC: A distributed clustering protocol for peer-to-peer networks
    Li, Yan
    Lao, Li
    Cui, Jun-Hong
    [J]. NETWORKING 2006: NETWORKING TECHNOLOGIES, SERVICES, AND PROTOCOLS; PERFORMANCE OF COMPUTER AND COMMUNICATION NETWORKS; MOBILE AND WIRELESS COMMUNICATIONS SYSTEMS, 2006, 3976 : 1234 - 1239
  • [6] Uncertain Data Clustering in Distributed Peer-to-Peer Networks
    Zhou, Jin
    Chen, Long
    Chen, C. L. Philip
    Wang, Yingxu
    Li, Han-Xiong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2392 - 2406
  • [7] Clustering distributed data streams in peer-to-peer environments
    Bandyopadhyay, Sanghamitra
    Giannella, Chris
    Maulik, Ujjwal
    Kargupta, Hillol
    Liu, Kun
    Datta, Souptik
    [J]. INFORMATION SCIENCES, 2006, 176 (14) : 1952 - 1985
  • [8] Transfer Collaborative Fuzzy Clustering in Distributed Peer-to-Peer Networks
    Dang, Bozhan
    Wang, Yingxu
    Zhou, Jin
    Wang, Rongrong
    Chen, Long
    Chen, C. L. Philip
    Zhang, Tong
    Han, Shiyuan
    Wang, Lin
    Chen, Yuehui
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (02) : 500 - 514
  • [9] Swarm-based distributed clustering in peer-to-peer systems
    Folino, Gianluigi
    Forestiero, Agostino
    Spezzano, Giandomenico
    [J]. ARTIFICIAL EVOLUTION, 2006, 3871 : 37 - 48
  • [10] A distributed approach to node clustering in decentralized peer-to-peer networks
    Ramaswamy, L
    Gedik, B
    Liu, L
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (09) : 814 - 829