Efficiently processing deterministic approximate aggregation query on massive data

被引:0
|
作者
Xixian Han
Bailing Wang
Jianzhong Li
Hong Gao
机构
[1] Harbin Institute of Technology,School of Computer Science and Technology
来源
关键词
Deterministic approximate aggregation; LDA; Massive data; Selection attribute lattice; Error reduction processing;
D O I
暂无
中图分类号
学科分类号
摘要
In actual applications, aggregation is an important operation to return statistical characterizations of subset of the data set. On massive data, approximate aggregation often is preferable for its better timeliness and responsiveness. This paper focuses on deterministic approximate aggregation to return running aggregate within progressive deterministic error interval. The existing methods either return approximate results with fixed accuracy, or return online approximate aggregate with probabilistic confidence interval, or incur a high I/O cost on massive data. This paper proposes LDA algorithm to compute deterministic approximate aggregate on massive data efficiently. LDA utilizes selection attribute lattice of hierarchical structure to distribute tuples and obtain a horizontal partitioning of the table. In each partition, each selection attribute is kept in column file and each ranking attribute is transposed to bit-slices. Given the selection condition, only relevant partitions are involved to compute the running aggregate. The compact storage scheme based on Z-order space filling curve is proposed to reduce the management cost of the partitions. An error reduction method is devised to reduce the error interval when computing running aggregate. The extensive experimental results on synthetic and real data sets show that LDA has a significant performance advantage over the existing algorithms.
引用
收藏
页码:437 / 473
页数:36
相关论文
共 50 条
  • [1] Efficiently processing deterministic approximate aggregation query on massive data
    Han, Xixian
    Wang, Bailing
    Li, Jianzhong
    Gao, Hong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (02) : 437 - 473
  • [2] Efficiently processing (p, ε)-approximate join aggregation on massive data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    [J]. INFORMATION SCIENCES, 2014, 278 : 773 - 792
  • [3] A Histogram based Analytical Approximate Query Processing for Massive Data
    Wang, Yijun
    Wang, Hanhu
    Li, Hui
    [J]. INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 362 - 365
  • [4] TDEP: efficiently processing top-k dominating query on massive data
    Xixian Han
    Jianzhong Li
    Hong Gao
    [J]. Knowledge and Information Systems, 2015, 43 : 689 - 718
  • [5] TDEP: efficiently processing top-k dominating query on massive data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (03) : 689 - 718
  • [6] TKAP: Efficiently processing top-k query on massive data by adaptive pruning
    Han, Xixian
    Liu, Xianmin
    Li, Jianzhong
    Gao, Hong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 47 (02) : 301 - 328
  • [7] TKAP: Efficiently processing top-k query on massive data by adaptive pruning
    Xixian Han
    Xianmin Liu
    Jianzhong Li
    Hong Gao
    [J]. Knowledge and Information Systems, 2016, 47 : 301 - 328
  • [8] Approximate Query Processing for Interactive Data Science
    Kraska, Tim
    [J]. SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 525 - 525
  • [9] Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing
    Liang, Xi
    Sintos, Stavros
    Shang, Zechao
    Krishnan, Sanjay
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1129 - 1141
  • [10] An Online Approximate Aggregation Query Processing Method Based on Hadoop
    Zhang, Zhiqiang
    Hu, Jianghua
    Xie, Xiaoqin
    Pan, Haiwei
    Feng, Xiaoning
    [J]. 2016 IEEE 20TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2016, : 117 - 122