Efficient Estimation of Triangles in Very Large Graphs

被引:9
|
作者
Etemadi, Roohollah [1 ]
Lu, Jianguo [1 ]
Tsin, Yung H. [1 ]
机构
[1] Univ Windsor, Sch Comp Sci, Windsor, ON N9B 3P4, Canada
关键词
Graph Sampling; Estimation; Triangles; Graph Algorithms; Clustering Coefficient;
D O I
10.1145/2983323.2983849
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The number of triangles in a graph is an important metric for understanding the graph. It is also directly related to the clustering coefficient of a graph, which is one of the most important indicator for social networks. Counting the number of triangles is computationally expensive for very large graphs. Hence, estimation is necessary for large graphs, particularly for graphs that are hidden behind searchable interfaces where the graphs in their entirety are not available. For instance, user networks in Twitter and Facebook are not available for third parties to explore their properties directly. This paper proposes a new method to estimate the number of triangles based on random edge sampling. It improves the traditional random edge sampling by probing the edges that have a higher probability of forming triangles. The method outperforms the traditional method consistently, and can be better by orders of magnitude when the graph is very large. The result is demonstrated on 20 graphs, including the largest graphs we can find. More importantly, we proved the improvement ratio, and verified our result on all the datasets. The analytical results are achieved by simplifying the variances of the estimators based on the assumption that the graph is very large. We believe that such big data assumption can lead to interesting results not only in triangle estimation, but also in other sampling problems.
引用
收藏
页码:1251 / 1260
页数:10
相关论文
共 50 条
  • [1] An Efficient MapReduce Algorithm for Counting Triangles in a Very Large Graph
    Park, Ha-Myung
    Chung, Chin-Wan
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 539 - 548
  • [2] Efficient Representation of Very Large Linked Datasets as Graphs
    Krommyda, Maria
    Kantere, Verena
    Vassiliou, Yannis
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS), VOL 1, 2020, : 106 - 115
  • [3] Counting Triangles in Large Graphs on GPU
    Polak, Adam
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 740 - 746
  • [4] Efficient and Effective SPARQL Autocompletion on Very Large Knowledge Graphs
    Bast, Hannah
    Kalmbach, Johannes
    Klumpp, Theresa
    Kramer, Florian
    Schnelle, Niklas
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2893 - 2902
  • [5] The Number of Large Graphs with a Positive Density of Triangles
    P. Collet
    J.-P. Eckmann
    [J]. Journal of Statistical Physics, 2002, 109 : 923 - 943
  • [6] The number of large graphs with a positive density of triangles
    Collet, P
    Eckmann, JP
    [J]. JOURNAL OF STATISTICAL PHYSICS, 2002, 109 (5-6) : 923 - 943
  • [7] Counting Triangles in Large Graphs by Random Sampling
    Wu, Bin
    Yi, Ke
    Li, Zhenguo
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (08) : 2013 - 2026
  • [8] Finding, Counting, and Highlighting all Triangles in Large Graphs
    Uddin, Md. Ashraf
    Chowdhury, Kanchan
    Ray, Liton Kumar
    [J]. 2019 1ST INTERNATIONAL CONFERENCE ON ROBOTICS, ELECTRICAL AND SIGNAL PROCESSING TECHNIQUES (ICREST), 2019, : 59 - 62
  • [9] PROGRAMMING WITH VERY LARGE GRAPHS
    BAILEY, DA
    CUNY, JE
    FISHER, CD
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 532 : 84 - 97
  • [10] Estimation of distance-based metrics for very large graphs with MinHash Signatures
    Amati, Giambattista
    Angelini, Simone
    Gambosi, Giorgio
    Rossi, Gianluca
    Vocca, Paola
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 536 - 545