Efficient approximation of correlated sums on data streams

被引:14
|
作者
Ananthakrishna, R
Das, A
Gehrke, J
Korn, F
Muthukrishnan, S
Srivastava, D
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[2] AT&T Labs Res, Florham Pk, NJ 07932 USA
关键词
correlated aggregates; data streams; approximation; summary structures; a priori error bounds; IP network management;
D O I
10.1109/TKDE.2003.1198391
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many applications such as IP network management, data arrives in streams and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries, formed by composing basic aggregates on (x, y) pairs and are of the form SUM{g(y) : x less than or equal to f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified functions. CS-aggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when AGG(x) can be computed in limited space (e.g., MIN, MAX, AVG), using two variants of Greenwald and Khanna's summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much less space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when AGG(x) is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large.
引用
收藏
页码:569 / 572
页数:4
相关论文
共 50 条
  • [41] Efficient Optimized Query Mesh for Data Streams
    Mohamed, Fatma
    Ismail, Rasha
    Badr, Nagwa
    Tolba, Mohamed Fahmy
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 157 - 163
  • [42] An efficient strategy for finding the patterns of data streams
    Jiao, F
    He, GM
    Proceedings of the 11th Joint International Computer Conference, 2005, : 617 - 620
  • [43] Efficient aggregate computation over data streams
    Nagaraj, Kanthi
    Naidu, K. V. M.
    Rastogi, Rajeev
    Satkin, Scott
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1382 - +
  • [44] APPROXIMATION WITH EXPONENTIAL SUMS
    BRAESS, D
    COMPUTING, 1967, 2 (04) : 309 - &
  • [45] The approximation of certain sums
    Denjoy, A
    COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES, 1937, 204 : 1396 - 1398
  • [46] Efficient Data Streams Processing in the Real Time Data Warehouse
    Majeed, Fiaz
    Mahmood, Muhammad Sohaib
    Iqbal, Mujahid
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 5, 2010, : 57 - 61
  • [47] Decision Trees for Mining Data Streams Based on the Gaussian Approximation
    Rutkowski, Leszek
    Jaworski, Maciej
    Pietruczuk, Lena
    Duda, Piotr
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (01) : 108 - 119
  • [48] Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees
    Rajagopalan, Anand
    Vitale, Fabio
    Vainstein, Danny
    Citovsky, Gui
    Procopiuc, Cecilia M.
    Gentile, Claudio
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [49] Approximation Algorithms for Massive High-Rate Data Streams
    Cuzzocrea, Alfredo
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, 2013, 185 : 59 - 68
  • [50] Correlated sums of r(n)
    Chamizo, F
    JOURNAL OF THE MATHEMATICAL SOCIETY OF JAPAN, 1999, 51 (01) : 237 - 252