Approximately Counting Triangles in Large Graph Streams Including Edge Duplicates with a Fixed Memory Usage

被引:26
|
作者
Wang, Pinghui [1 ,5 ]
Qi, Yiyan [1 ]
Sun, Yu [1 ]
Zhang, Xiangliang [2 ]
Tao, Jing [1 ]
Guan, Xiaohong [1 ,3 ,4 ]
机构
[1] Xi An Jiao Tong Univ, NSKEYLAB, Shenzhen, Peoples R China
[2] King Abdullah Univ Sci & Technol, Thuwal, Saudi Arabia
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
[4] Tsinghua Univ, NLIST Lab, Beijing, Peoples R China
[5] Xi An Jiao Tong Univ, Shenzhen Res Inst, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2017年 / 11卷 / 02期
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
ALGORITHMS;
D O I
10.14778/3149193.3149197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Counting triangles in a large graph is important for detecting network anomalies such as spam web pages and suspicious accounts (e.g., fraudsters and advertisers) on online social networks. However, it is challenging to compute the number of triangles in a large graph represented as a stream of edges with a low computational cost when given a limited memory. Recently, several effective sampling-based approximation methods have been developed to solve this problem. However, they assume the graph stream of interest contains no duplicate edges, which does not hold in many real-world graph streams (e.g., phone calling networks). In this paper, we observe that these methods exhibit a large estimation error or computational cost even when modified to deal with duplicate edges using deduplication techniques such as Bloom filter and hash-based sampling. To solve this challenge, we design a one pass streaming algorithm for uniformly sampling distinct edges at a high speed. Compared to state-of-the-art algorithms, our algorithm reduces the sampling cost per edge from O (log k) (k is the maximum number of sampled edges determined by the available memory space) to O (1) without using any additional memory space. Based on sampled edges, we develop a simple yet accurate method to infer the number of triangles in the original graph stream. We conduct extensive experiments on a variety of real world large graphs, and the results demonstrate that our method is several times more accurate and faster than state-of-the-art methods with the same memory usage.
引用
收藏
页码:162 / 175
页数:14
相关论文
共 20 条
  • [1] Approximately Counting Butterflies in Large Bipartite Graph Streams
    Li, Rundong
    Wang, Pinghui
    Jia, Peng
    Zhang, Xiangliang
    Zhao, Junzhou
    Tao, Jing
    Yuan, Ye
    Guan, Xiaohong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (12) : 5621 - 5635
  • [2] A second look at counting triangles in graph streams
    Cormode, Graham
    Jowhari, Hossein
    [J]. THEORETICAL COMPUTER SCIENCE, 2014, 552 : 44 - 51
  • [3] MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams
    Lim, Yongsub
    Kang, U.
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 685 - 694
  • [4] A second look at counting triangles in graph streams (corrected)
    Cormode, Graham
    Jowhari, Hossein
    [J]. THEORETICAL COMPUTER SCIENCE, 2017, 683 : 22 - 30
  • [5] TRIEST: Counting Local and Global Triangles in Fully Dynamic Streams with Fixed Memory Size
    De Stefani, Lorenzo
    Epasto, Alessandro
    Riondato, Matteo
    Upfal, Eli
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2017, 11 (04)
  • [6] TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size
    De Stefani, Lorenzo
    Epasto, Alessandro
    Riondato, Matteo
    Upfal, Eli
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 825 - 834
  • [7] Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs
    Lim, Yongsub
    Jung, Minsoo
    Kang, U.
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (01)
  • [8] Towards Tighter Space Bounds for Counting Triangles and Other Substructures in Graph Streams
    Bera, Suman K.
    Chakrabarti, Amit
    [J]. 34TH SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2017), 2017, 66
  • [9] An Efficient MapReduce Algorithm for Counting Triangles in a Very Large Graph
    Park, Ha-Myung
    Chung, Chin-Wan
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 539 - 548
  • [10] A second look at counting triangles in graph streams (vol 552, pg 44, 2014)
    Cormode, Graham
    Jowhari, Hossein
    [J]. THEORETICAL COMPUTER SCIENCE, 2017, 683 : 31 - 32