Efficient approximation of correlated sums on data streams

被引:14
|
作者
Ananthakrishna, R
Das, A
Gehrke, J
Korn, F
Muthukrishnan, S
Srivastava, D
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[2] AT&T Labs Res, Florham Pk, NJ 07932 USA
关键词
correlated aggregates; data streams; approximation; summary structures; a priori error bounds; IP network management;
D O I
10.1109/TKDE.2003.1198391
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many applications such as IP network management, data arrives in streams and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries, formed by composing basic aggregates on (x, y) pairs and are of the form SUM{g(y) : x less than or equal to f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified functions. CS-aggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when AGG(x) can be computed in limited space (e.g., MIN, MAX, AVG), using two variants of Greenwald and Khanna's summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much less space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when AGG(x) is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large.
引用
收藏
页码:569 / 572
页数:4
相关论文
共 50 条
  • [31] Top-k Correlated Subgraph Query for Data Streams
    Pan, Shirui
    Zhu, Xingquan
    Fang, Meng
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2906 - 2909
  • [32] TWStream: Finding correlated data streams under time warping
    Wang, T
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 213 - 225
  • [33] Time-decayed correlated aggregates over data streams
    Cormode, Graham
    Tirthapura, Srikanta
    Xu, Bojian
    Statistical Analysis and Data Mining, 2009, 2 (5-6): : 294 - 310
  • [34] Finding Correlated Heavy-Hitters over Data Streams
    Lahiri, Bibudh
    Tirthapura, Srikanta
    2009 IEEE 28TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCC 2009), 2009, : 307 - 314
  • [35] Efficient incremental subspace clustering in data streams
    Kontaki, Maria
    Papadopoulos, Apostolos N.
    Manolopoulos, Yannis
    10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 53 - 60
  • [36] An Efficient Itemset Mining Approach for Data Streams
    Baralis, Elena
    Cerquitelli, Tania
    Chiusano, Silvia
    Grand, Alberto
    Grimaudo, Luigi
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II: 15TH INTERNATIONAL CONFERENCE, KES 2011, 2011, 6882 : 515 - 523
  • [37] Efficient reservoir sampling for transactional data streams
    Dash, Manoranjan
    Ng, Willie
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 662 - +
  • [38] Efficient object tracking in WAAS data streams
    Clarke, Trevor R. H.
    Canosa, Roxanne
    REAL-TIME IMAGE AND VIDEO PROCESSING 2011, 2011, 7871
  • [39] Towards Efficient KNN Joins on Data Streams
    Yang, Chong
    Yu, Xiaohui
    Liu, Yang
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 782 - 783
  • [40] Efficient Aggregation Methods for Probabilistic Data Streams
    Goman, Maksim
    BUSINESS MODELING AND SOFTWARE DESIGN, BMSD 2018, 2018, 319 : 116 - 132