Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs

被引:45
|
作者
Seshadhri, C. [1 ]
Pinar, Ali [1 ]
Kolda, Tamara G. [1 ]
机构
[1] Sandia Natl Labs, Livermore, CA 94550 USA
关键词
triangle counting; clustering coefficients; directed triangles; triangle characteristics; wedge sampling; STREAMING ALGORITHMS;
D O I
10.1002/sam.11224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderately sized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimating local clustering coefficients, degree-wise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than state-of-the-art alternatives. (C) 2014 Wiley Periodicals, Inc.
引用
收藏
页码:294 / 307
页数:14
相关论文
共 50 条
  • [1] Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs
    Turkoglu, Duru
    Turk, Ata
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 455 - 464
  • [2] Estimating Clustering Coefficients via Metropolis-Hastings Random Walk and Wedge Sampling on Large OSN Graphs
    Cem, Emrah
    Sarac, Kamil
    [J]. 2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [3] Revisiting Wedge Sampling for Triangle Counting
    Turk, Ata
    Turkoglu, Duru
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 1875 - 1885
  • [4] Computing Triangle and Open-Wedge Heavy-Hitters in Large Networks
    Pavan, A.
    Quint, P.
    Scott, S.
    Vinodchandran, N. V.
    Smith, J.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 998 - 1005
  • [5] Subgraph Counts in Random Clustering Graphs
    Chung, Fan
    Sieger, Nicholas
    [J]. MODELLING AND MINING NETWORKS, WAW 2024, 2024, 14671 : 1 - 16
  • [6] Dense Graphs With a Large Triangle Cover Have a Large Triangle Packing
    Yuster, Raphael
    [J]. COMBINATORICS PROBABILITY & COMPUTING, 2012, 21 (06): : 952 - 962
  • [7] The missing log in large deviations for triangle counts
    Chatterjee, Sourav
    [J]. RANDOM STRUCTURES & ALGORITHMS, 2012, 40 (04) : 437 - 451
  • [8] Reservoir-based sampling over large graph streams to estimate triangle counts and node degrees
    Zhang, Lingling
    Jiang, Hong
    Wang, Fang
    Feng, Dan
    Xie, Yanwen
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 108 : 244 - 255
  • [9] Computing Node Clustering Coefficients Securely
    Areekijseree, Katchaguy
    Tang, Yuzhe
    Soundarajan, Sucheta
    [J]. PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2019), 2019, : 532 - 533
  • [10] Clustering coefficients of large networks
    Li, Yusheng
    Shang, Yilun
    Yang, Yiting
    [J]. INFORMATION SCIENCES, 2017, 382 : 350 - 358