Efficient approximation of correlated sums on data streams

被引:14
|
作者
Ananthakrishna, R
Das, A
Gehrke, J
Korn, F
Muthukrishnan, S
Srivastava, D
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[2] AT&T Labs Res, Florham Pk, NJ 07932 USA
关键词
correlated aggregates; data streams; approximation; summary structures; a priori error bounds; IP network management;
D O I
10.1109/TKDE.2003.1198391
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many applications such as IP network management, data arrives in streams and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries, formed by composing basic aggregates on (x, y) pairs and are of the form SUM{g(y) : x less than or equal to f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified functions. CS-aggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when AGG(x) can be computed in limited space (e.g., MIN, MAX, AVG), using two variants of Greenwald and Khanna's summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much less space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when AGG(x) is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large.
引用
收藏
页码:569 / 572
页数:4
相关论文
共 50 条
  • [21] Efficient subspace search in data streams
    Fouche, Edouard
    Kalinke, Florian
    Boehm, Klemens
    INFORMATION SYSTEMS, 2021, 97 (97)
  • [22] Efficient Record Linkage in Data Streams
    Karapiperis, Dimitrios
    Gkoulalas-Divanis, Aris
    Verykios, Vassilios S.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 523 - 532
  • [23] Efficient clustering of uncertain data streams
    Cheqing Jin
    Jeffrey Xu Yu
    Aoying Zhou
    Feng Cao
    Knowledge and Information Systems, 2014, 40 : 509 - 539
  • [24] Efficient clustering of uncertain data streams
    Jin, Cheqing
    Yu, Jeffrey Xu
    Zhou, Aoying
    Cao, Feng
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 40 (03) : 509 - 539
  • [25] Space-efficient Online Approximation of Time Series Data: Streams, Amnesia, and Out-of-order
    Gandhi, Sorabh
    Foschini, Luca
    Suri, Subhash
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 924 - 935
  • [26] Experimental Study on Machine Learning with Approximation to Data Streams
    Fu, Zhang
    Jiang, Jiani
    2019 SIXTH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS: SYSTEMS, MANAGEMENT AND SECURITY (IOTSMS), 2019, : 561 - 566
  • [27] WIP: Towards Optimal Online Approximation of Data Streams
    Sitbon, Phillip
    Bulusu, Nirupama
    Feng, Wu-chi
    2011 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS AND WORKSHOPS (DCOSS), 2011,
  • [28] (1+ε\)-Approximation for Facility Location in Data Streams
    Czumaj, Artur
    Lammersen, Christiane
    Monemizadeh, Morteza
    Sohler, Christian
    PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), 2013, : 1710 - 1728
  • [29] Approximation Algorithms for Wavelet Transform Coding of Data Streams
    Guha, Sudipto
    Harb, Boulos
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 698 - 707
  • [30] V-Optimal Filters for Data Approximation in Continuous Data Streams
    Gomes, Joseph
    Chen, Wenhao
    Dahal, Pushkar
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 545 - 550