Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs

被引:9
|
作者
Turkoglu, Duru [1 ]
Turk, Ata [2 ]
机构
[1] Depaul Univ, Chicago, IL 60604 USA
[2] Boston Univ, Boston, MA 02215 USA
关键词
D O I
10.1109/ICDM.2017.55
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of triangles in a graph is useful to deduce a plethora of important features of the network that the graph is modeling. However, finding the exact value of this number is computationally expensive. Hence, a number of approximation algorithms based on random sampling of edges, or wedges (adjacent edge pairs) have been proposed for estimating this value. We argue that for large sparse graphs with power-law degree distribution, random edge sampling requires sampling large number of edges before providing enough information for accurate estimation, and existing wedge sampling methods lead to biased samplings, which in turn lead to less accurate estimations. In this paper, we propose a hybrid algorithm between edge and wedge sampling that addresses the deficiencies of both approaches. We start with uniform edge sampling and then extend each selected edge to form a wedge that is more informative for estimating the overall triangle count. The core estimate we make is the number of triangles each sampled edge in the first phase participates in. This approach provides accurate approximations with very small sampling ratios, outperforming the state-of-the-art up to 8 times in sample size while providing estimations with 95% confidence.
引用
收藏
页码:455 / 464
页数:10
相关论文
共 50 条
  • [1] Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs
    Seshadhri, C.
    Pinar, Ali
    Kolda, Tamara G.
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2014, 7 (04) : 294 - 307
  • [2] Parallel Edge-based Sampling for Static and Dynamic Graphs
    Lakhotia, Kartik
    Kannan, Rajgopal
    Gaur, Aditya
    Srivastava, Ajitesh
    Prasanna, Viktor
    [J]. CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2019, : 125 - 134
  • [3] Reservoir-based sampling over large graph streams to estimate triangle counts and node degrees
    Zhang, Lingling
    Jiang, Hong
    Wang, Fang
    Feng, Dan
    Xie, Yanwen
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 108 : 244 - 255
  • [4] Wave equations for graphs and the edge-based Laplacian
    Friedman, J
    Tillich, JP
    [J]. PACIFIC JOURNAL OF MATHEMATICS, 2004, 216 (02) : 229 - 266
  • [5] Typical large graphs with given edge and triangle densities
    Joe Neeman
    Charles Radin
    Lorenzo Sadun
    [J]. Probability Theory and Related Fields, 2023, 186 : 1167 - 1223
  • [6] Typical large graphs with given edge and triangle densities
    Neeman, Joe
    Radin, Charles
    Sadun, Lorenzo
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 2023, 186 (3-4) : 1167 - 1223
  • [7] Vertex-based and edge-based centroids of graphs
    Lan, Yongxin
    Li, Tao
    Ma, Yuede
    Shi, Yongtang
    Wang, Hua
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2018, 331 : 445 - 456
  • [8] Sampling Based Efficient Algorithm to Estimate the Spectral Radius of Large Graphs
    Abbas, Samar
    Tariq, Juvaria
    Zaman, Arif
    Khan, Imdadullah
    [J]. 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2017, : 175 - 180
  • [9] Global triangle estimation based on first edge sampling in large graph streams
    Changyong Yu
    Huimin Liu
    Fazal Wahab
    Zihan Ling
    Tianmei Ren
    Haitao Ma
    Yuhai Zhao
    [J]. The Journal of Supercomputing, 2023, 79 : 14079 - 14116
  • [10] Global triangle estimation based on first edge sampling in large graph streams
    Yu, Changyong
    Liu, Huimin
    Wahab, Fazal
    Ling, Zihan
    Ren, Tianmei
    Ma, Haitao
    Zhao, Yuhai
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (13): : 14079 - 14116