Better Algorithms for Counting Triangles in Data Streams

被引:36
|
作者
McGregor, Andrew [1 ]
Vorotnikova, Sofya [1 ]
Vu, Hoa T. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
关键词
data streams; triangles; clustering coefficients; GRAPH; SUBGRAPH;
D O I
10.1145/2902251.2902283
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present space-efficient data stream algorithms for approximating the number of triangles in a graph up to a factor 1 + epsilon. While it can be shown that determining whether a graph is triangle-free is not possible in sub-linear space, a large body of work has focused on minimizing the space required in terms of the number of triangles T (or a lower bound on this quantity) and other parameters including the number of nodes n and the number of edges m. Two models are important in the literature: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order and the adjacency list order model in which all edges incident to the same node appear consecutively. We improve over the state of the art results in both models. For the adjacency list order model, we show that (O) over tilde (c(-2)mR/root T) space is sufficient in one pass and (O) over tilde(epsilon(-2)m(3/2)/T) space is sufficient in two passes where the (O) over tilde(.) notation suppresses log factors. For the arbitrary order model, we show that (O) over tilde (epsilon(-2)m/root T) space suffices given two passes and that (O) over tilde(epsilon(-2)m(3/2)/T) space suffices given three passes and oracle access to the degrees. Finally, we show how to efficiently implement the "wedge sampling" approach to triangle estimation in the arbitrary order model. To do this, we develop the first algorithm for fp sampling such that multiple independent samples can be generated with O (polylog n) update time; this primitive is widely applicable and this result may be of independent interest.
引用
下载
收藏
页码:401 / 411
页数:11
相关论文
共 50 条
  • [21] Counting Clean Triangles
    Mizan R. Khan
    Riaz R. Khan
    The Mathematical Intelligencer, 2023, 45 : 9 - 15
  • [22] COUNTING EQUILATERAL TRIANGLES
    MOSER, WOJ
    FREITAG, HT
    FIBONACCI QUARTERLY, 1980, 18 (04): : 371 - 372
  • [23] Range counting over multidimensional data streams
    Suri, Subhash
    Toth, Csaba D.
    Zhou, Yunhong
    DISCRETE & COMPUTATIONAL GEOMETRY, 2006, 36 (04) : 633 - 655
  • [24] Practical Range Counting over Data Streams
    Bai, Ran
    Lai, Ziliang
    Lo, Eric
    Hon, Wing-Kai
    Zhang, Pengfei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 659 - 668
  • [25] An Approximate Counting for Big Textual Data Streams
    Raymond, Rudy
    Koyanagi, Teruo
    Osogami, Takayuki
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1085 - +
  • [26] Range Counting over Multidimensional Data Streams
    Subhash Suri
    Csaba D. Toth
    Yunhong Zhou
    Discrete & Computational Geometry, 2006, 36 : 633 - 655
  • [27] Counting Triangles in Real-World Graph Streams: Dealing with Repeated Edges and Time Windows
    Jha, Madhav
    Pinar, Ali
    Seshadhri, C.
    2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 1507 - 1514
  • [28] TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size
    De Stefani, Lorenzo
    Epasto, Alessandro
    Riondato, Matteo
    Upfal, Eli
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 825 - 834
  • [29] Approximately Counting Triangles in Large Graph Streams Including Edge Duplicates with a Fixed Memory Usage
    Wang, Pinghui
    Qi, Yiyan
    Sun, Yu
    Zhang, Xiangliang
    Tao, Jing
    Guan, Xiaohong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 11 (02): : 162 - 175
  • [30] A Comparison of Clustering Algorithms for Data Streams
    Pereira, Cassio M. M.
    de Mello, Rodrigo F.
    INTEGRATED COMPUTING TECHNOLOGY, 2011, 165 : 59 - 74