Tiered Sampling: An Efficient Method for Counting Sparse Motifs in Massive Graph Streams

被引:2
|
作者
De Stefani, Lorenzo [1 ]
Terolli, Erisa [2 ]
Upfal, Eli [1 ]
机构
[1] Brown Univ, Dept Comp Sci, 115 Waterman St, Providence, RI 02906 USA
[2] Max Planck Inst Informat, Stuhlsatzenhausweg 4, D-66123 Saarbrucken, Germany
关键词
Graph motif mining; reservoir sampling; stream computing;
D O I
10.1145/3441299
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce TIERED SAMPLING, a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M, which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs-sub-graph patterns-that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing TIERED SAMPLING to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.
引用
收藏
页数:52
相关论文
共 18 条
  • [1] On Sampling from Massive Graph Streams
    Ahmed, Nesreen K.
    Duffield, Nick
    Willke, Theodore L.
    Rossi, Ryan A.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1430 - 1441
  • [2] MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams
    Lim, Yongsub
    Kang, U.
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 685 - 694
  • [3] Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs
    Lim, Yongsub
    Jung, Minsoo
    Kang, U.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (01)
  • [4] Improved Triangle Counting in Graph Streams: Power of Multi-Sampling
    Kavassery-Parakkat, Neeraj
    Hanjani, Kiana Mousavi
    Pavan, A.
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 33 - 40
  • [5] Mixed Random Sampling of Frames method for counting number of motifs
    Yudina, M. N.
    Zadorozhnyi, V. N.
    Yudin, E. B.
    MECHANICAL SCIENCE AND TECHNOLOGY UPDATE (MSTU 2019), 2019, 1260
  • [6] WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams
    Shin, Kijung
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 1087 - 1092
  • [7] Temporal locality-aware sampling for accurate triangle counting in real graph streams
    Dongjin Lee
    Kijung Shin
    Christos Faloutsos
    The VLDB Journal, 2020, 29 : 1501 - 1525
  • [8] Temporal locality-aware sampling for accurate triangle counting in real graph streams
    Lee, Dongjin
    Shin, Kijung
    Faloutsos, Christos
    VLDB JOURNAL, 2020, 29 (06): : 1501 - 1525
  • [9] An efficient in situ method for sampling periphyton in lakes and streams
    Peters, L
    Scheifhacken, N
    Kahlert, M
    Rothhaupt, KO
    ARCHIV FUR HYDROBIOLOGIE, 2005, 163 (01): : 133 - 141
  • [10] BSR-TC: Adaptively Sampling for Accurate Triangle Counting over Evolving Graph Streams
    Xuan, Wei
    Cao, Huawei
    Yan, Mingyu
    Tang, Zhimin
    Ye, Xiaochun
    Fan, Dongrui
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2021, 31 (11N12) : 1561 - 1581