Efficient approximation of correlated sums on data streams

被引：14

作者：

Ananthakrishna, R

Das, A

Gehrke, J

Korn, F

Muthukrishnan, S

Srivastava, D

机构：

[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA

[2] AT&T Labs Res, Florham Pk, NJ 07932 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2003年 / 15卷 / 03期

关键词：

correlated aggregates; data streams; approximation; summary structures; a priori error bounds; IP network management;

D O I：

10.1109/TKDE.2003.1198391

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many applications such as IP network management, data arrives in streams and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries, formed by composing basic aggregates on (x, y) pairs and are of the form SUM{g(y) : x less than or equal to f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified functions. CS-aggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when AGG(x) can be computed in limited space (e.g., MIN, MAX, AVG), using two variants of Greenwald and Khanna's summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much less space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when AGG(x) is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large.

引用

页码：569 / 572

页数：4

共 50 条

[31] Top-k Correlated Subgraph Query for Data Streams
Pan, Shirui
Zhu, Xingquan
Fang, Meng
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2906 - 2909
[32] TWStream: Finding correlated data streams under time warping
Wang, T
FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 213 - 225
[33] Time-decayed correlated aggregates over data streams
Cormode, Graham
Tirthapura, Srikanta
Xu, Bojian
Statistical Analysis and Data Mining, 2009, 2 (5-6): : 294 - 310
[34] Finding Correlated Heavy-Hitters over Data Streams
Lahiri, Bibudh
Tirthapura, Srikanta
2009 IEEE 28TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCC 2009), 2009, : 307 - 314
[35] Efficient incremental subspace clustering in data streams
Kontaki, Maria
Papadopoulos, Apostolos N.
Manolopoulos, Yannis
10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 53 - 60
[36] An Efficient Itemset Mining Approach for Data Streams
Baralis, Elena
Cerquitelli, Tania
Chiusano, Silvia
Grand, Alberto
Grimaudo, Luigi
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II: 15TH INTERNATIONAL CONFERENCE, KES 2011, 2011, 6882 : 515 - 523
[37] Efficient reservoir sampling for transactional data streams
Dash, Manoranjan
Ng, Willie
ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 662 - +
[38] Efficient object tracking in WAAS data streams
Clarke, Trevor R. H.
Canosa, Roxanne
REAL-TIME IMAGE AND VIDEO PROCESSING 2011, 2011, 7871
[39] Towards Efficient KNN Joins on Data Streams
Yang, Chong
Yu, Xiaohui
Liu, Yang
2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 782 - 783
[40] Efficient Aggregation Methods for Probabilistic Data Streams
Goman, Maksim
BUSINESS MODELING AND SOFTWARE DESIGN, BMSD 2018, 2018, 319 : 116 - 132

← 1 2 3 4 5 →