Distributed Community Detection in Web-Scale Networks

被引:0
|
作者
Ovelgoenne, Michael [1 ]
机构
[1] Univ Maryland, UMIACS, College Pk, MD 20740 USA
关键词
Graph Clustering; Community Detection; Distributed Algorithms; MapReduce;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Partitioning large networks into smaller subnetworks (communities) is an important tool to analyze the structure of complex linked systems. In recent years, many in-memory community detection algorithms have been proposed for graphs with millions of edges. Analyzing massive graphs with billions of edges is impossible for existing algorithms. In this contribution, we show how to find community partitions of networks with billions of edges. Our approach is based on an ensemble learning scheme for community detection that provides a way to identify high quality partitions from an ensemble of partitions with lower quality. We present a pre-processing procedure for community detection algorithms that significantly decreases the problem size. After reducing the problem size, traditional non-distributed community detection algorithms can be applied. We implemented a weak but highly scalable label propagation algorithm on top of the distributed-computing framework Apache Hadoop. The evaluation of our implementation on a 50-node Hadoop cluster and with evaluation datasets up to 3.3 billion edges shows very good results with respect to clustering quality as well as scalability. For a smaller 260 million edge network, we show that our preprocessing can improve the results of the popular Louvain modularity clustering algorithm.
引用
收藏
页码:72 / 79
页数:8
相关论文
共 50 条
  • [1] Web-Scale Multimedia Information Networks
    Qi, Guo-Jun
    Tsai, Min-Hsuan
    Tsai, Shen-Fu
    Cao, Liangliang
    Huang, Thomas S.
    [J]. PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2688 - 2704
  • [2] Web-scale workflow - Integrating distributed services
    Blake, M. Brian
    Huhns, Michael N.
    [J]. IEEE INTERNET COMPUTING, 2008, 12 (01) : 55 - 59
  • [3] A Web-Scale Analysis of the Community Origins of Image Memes
    Morina, Durim
    Bernstein, Michael S.
    [J]. Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW1)
  • [4] MRQUSAR: A web-scale distributed spatial reasoner using MapReduce
    Nam, Sangha
    Kim, Incheol
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 296 - 303
  • [5] Web-Scale Generic Object Detection at Microsoft Bing
    Chen, Stephen Xi
    Mukherjee, Saurajit
    Phadke, Unmesh
    Wang, Tingting
    Park, Junwon
    Yada, Ravi Theja
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2674 - 2682
  • [6] Web-Scale Datacenters
    Douglis, Fred
    [J]. IEEE INTERNET COMPUTING, 2014, 18 (04) : 13 - 14
  • [7] Source Retrieval for Web-Scale Text Reuse Detection
    Hagen, Matthias
    Potthast, Martin
    Adineh, Payam
    Fatehifar, Ehsan
    Stein, Benno
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2091 - 2094
  • [8] Graph Convolutional Neural Networks for Web-Scale Recommender Systems
    Ying, Rex
    He, Ruining
    Chen, Kaifeng
    Eksombatchai, Pong
    Hamilton, William L.
    Leskovec, Jure
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 974 - 983
  • [9] Web-scale distributed AI search across disconnected and heterogeneous infrastructures
    Kelsey, Tom
    McCaffery, Martin
    Kotthoff, Lars
    [J]. 2014 IEEE 10TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), VOL 1, 2014, : 39 - 46
  • [10] DISTRIBUTED WEB-SCALE INFRASTRUCTURE FOR CRAWLING, INDEXING AND SEARCH WITH SEMANTIC SUPPORT
    Dlugolinsky, Stefan
    Seleng, Martin
    Laclavik, Michal
    Hluchy, Ladislav
    [J]. COMPUTER SCIENCE-AGH, 2012, 13 (04): : 5 - 19