Reservoir-based sampling over large graph streams to estimate triangle counts and node degrees

被引:1
|
作者
Zhang, Lingling [1 ]
Jiang, Hong [2 ]
Wang, Fang [1 ]
Feng, Dan [1 ]
Xie, Yanwen [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
[2] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76019 USA
关键词
Reservoir sampling; Graph streams; Triangle counts; Node degrees; NETWORK;
D O I
10.1016/j.future.2020.02.077
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reservoir sampling is widely employed to characterize large graph streams by producing edge samples. However, existing reservoir-based sampling methods mainly focus on counting triangles but perform poorly in analyzing topological characteristics reflected by node degrees. This paper proposes a new method, called triangle-induced reservoir sampling, or T-Sample, to count triangles and estimate node degrees simultaneously and efficiently. While every edge in a graph stream is processed only once by T-Sample, a dual sampling mechanism performing both uniform sampling and non-uniform sampling is carefully designed. Specifically, T-Sample's uniform sampling is used to count triangles by a newly proposed method with smaller estimation variances than existing reservoir-based sampling methods; whereas, its non-uniform sampling ensures that edge samples are connected. Experimental results driven by real datasets show that T-Sample can count triangles with smaller estimation errors and variances than the state-of-the-art reservoir-based sampling methods while obtaining much more accurate information about node degrees at smaller time and memory costs. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:244 / 255
页数:12
相关论文
共 6 条
  • [1] T-Sample: A Dual Reservoir-based Sampling Method for Characterizing Large Graph Streams
    Zhang, Lingling
    Jiang, Hong
    Wang, Fang
    Feng, Dan
    Xie, Yanwen
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1674 - 1677
  • [2] Efficiently Counting Triangles for Hypergraph Streams by Reservoir-Based Sampling
    Zhang, Lingling
    Zhang, Zhiwei
    Wang, Guoren
    Yuan, Ye
    Zhao, Kangfei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11328 - 11341
  • [3] Global triangle estimation based on first edge sampling in large graph streams
    Changyong Yu
    Huimin Liu
    Fazal Wahab
    Zihan Ling
    Tianmei Ren
    Haitao Ma
    Yuhai Zhao
    [J]. The Journal of Supercomputing, 2023, 79 : 14079 - 14116
  • [4] Global triangle estimation based on first edge sampling in large graph streams
    Yu, Changyong
    Liu, Huimin
    Wahab, Fazal
    Ling, Zihan
    Ren, Tianmei
    Ma, Haitao
    Zhao, Yuhai
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (13): : 14079 - 14116
  • [5] Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs
    Turkoglu, Duru
    Turk, Ata
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 455 - 464
  • [6] BSR-TC: Adaptively Sampling for Accurate Triangle Counting over Evolving Graph Streams
    Xuan, Wei
    Cao, Huawei
    Yan, Mingyu
    Tang, Zhimin
    Ye, Xiaochun
    Fan, Dongrui
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2021, 31 (11N12) : 1561 - 1581