Efficiently processing deterministic approximate aggregation query on massive data

被引：0

作者：

Xixian Han

Bailing Wang

Jianzhong Li

Hong Gao

机构：

[1] Harbin Institute of Technology,School of Computer Science and Technology

来源：

Knowledge and Information Systems | 2018年 / 57卷

关键词：

Deterministic approximate aggregation; LDA; Massive data; Selection attribute lattice; Error reduction processing;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In actual applications, aggregation is an important operation to return statistical characterizations of subset of the data set. On massive data, approximate aggregation often is preferable for its better timeliness and responsiveness. This paper focuses on deterministic approximate aggregation to return running aggregate within progressive deterministic error interval. The existing methods either return approximate results with fixed accuracy, or return online approximate aggregate with probabilistic confidence interval, or incur a high I/O cost on massive data. This paper proposes LDA algorithm to compute deterministic approximate aggregate on massive data efficiently. LDA utilizes selection attribute lattice of hierarchical structure to distribute tuples and obtain a horizontal partitioning of the table. In each partition, each selection attribute is kept in column file and each ranking attribute is transposed to bit-slices. Given the selection condition, only relevant partitions are involved to compute the running aggregate. The compact storage scheme based on Z-order space filling curve is proposed to reduce the management cost of the partitions. An error reduction method is devised to reduce the error interval when computing running aggregate. The extensive experimental results on synthetic and real data sets show that LDA has a significant performance advantage over the existing algorithms.

引用

页码：437 / 473

页数：36

共 50 条

[1] Efficiently processing deterministic approximate aggregation query on massive data
Han, Xixian
Wang, Bailing
Li, Jianzhong
Gao, Hong
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (02) : 437 - 473
[2] Efficiently processing (p, ε)-approximate join aggregation on massive data
Han, Xixian
Li, Jianzhong
Gao, Hong
[J]. INFORMATION SCIENCES, 2014, 278 : 773 - 792
[3] A Histogram based Analytical Approximate Query Processing for Massive Data
Wang, Yijun
Wang, Hanhu
Li, Hui
[J]. INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 362 - 365
[4] TDEP: efficiently processing top-k dominating query on massive data
Xixian Han
Jianzhong Li
Hong Gao
[J]. Knowledge and Information Systems, 2015, 43 : 689 - 718
[5] TDEP: efficiently processing top-k dominating query on massive data
Han, Xixian
Li, Jianzhong
Gao, Hong
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (03) : 689 - 718
[6] TKAP: Efficiently processing top-k query on massive data by adaptive pruning
Han, Xixian
Liu, Xianmin
Li, Jianzhong
Gao, Hong
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 47 (02) : 301 - 328
[7] TKAP: Efficiently processing top-k query on massive data by adaptive pruning
Xixian Han
Xianmin Liu
Jianzhong Li
Hong Gao
[J]. Knowledge and Information Systems, 2016, 47 : 301 - 328
[8] Approximate Query Processing for Interactive Data Science
Kraska, Tim
[J]. SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 525 - 525
[9] Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing
Liang, Xi
Sintos, Stavros
Shang, Zechao
Krishnan, Sanjay
[J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1129 - 1141
[10] An Online Approximate Aggregation Query Processing Method Based on Hadoop
Zhang, Zhiqiang
Hu, Jianghua
Xie, Xiaoqin
Pan, Haiwei
Feng, Xiaoning
[J]. 2016 IEEE 20TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2016, : 117 - 122

← 1 2 3 4 5 →