Picube for Fast Exploration of Large Datasets

被引：0

作者：

Fu, Wenxiao ^{[1
]}

机构：

[1] York Univ, Toronto, ON, Canada

来源：

2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020) | 2020年

关键词：

inductive aggregation; data-cube; partition; flexibility; efficiency; size;

D O I：

10.1109/ICDE48307.2020.00246

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hierarchical aggregation supports fast exploration of large datasets by pre-aggregating data into a multi-scale data structure. While the pre-aggregation process is done offline, it can be quite expensive, and the resulting data-structure extremely large. When the data is multi-dimensional, this is greatly compounded. Data-cube-based approaches can result in extremely large cubic data-structures, aggregating more than is needed. Other approaches do not aggregate enough, and so do not offer the necessary flexibility for dimension-wise roll-ups and drill-downs. We design a hierarchical data-structure for aggregation that strikes a balance, and provides enough flexibility for different exploration scenarios with low-cost construction and reasonable size. Inductive aggregation is a methodology to compute levels of aggregations efficiently, while the resulting data-structure supports smooth data exploration. Inspired by this, we propose a partitioned, inductively aggregated data-cube, picube. A framework we call stratum space is presented with the model to express the dependencies across aggregation levels. Optimization choices are discussed, providing good design tradeoffs between storing and querying of the data.

引用

页码：2069 / 2073

页数：5

共 50 条

[1] Fast exploration and classification of large hyperspectral image datasets for early bruise detection on apples
Ferrari, Carlotta
Foca, Giorgia
Calvini, Rosalba
Ulrici, Alessandro
[J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 146 : 108 - 119
[2] Online Outlier Exploration Over Large Datasets
Cao, Lei
Wei, Mingrui
Yang, Di
Rundensteiner, Elke A.
[J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 89 - 98
[3] Fast Bayesian inference for large occupancy datasets
Diana, Alex
Dennis, Emily Beth
Matechou, Eleni
Morgan, Byron John Treharne
[J]. BIOMETRICS, 2023, 79 (03) : 2503 - 2515
[4] Fast approximating triangulation of large scattered datasets
Weimer, H
Warren, J
[J]. ADVANCES IN ENGINEERING SOFTWARE, 1999, 30 (06) : 389 - 400
[5] Fast Bayesian hyperparameter optimization on large datasets
Klein, Aaron
Falkner, Stefan
Bartels, Simon
Hennig, Philipp
Hutter, Frank
[J]. ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 4945 - 4968
[6] Comparison of fast regression algorithms in large datasets
Cangur, Sengul
Ankarali, Handan
[J]. KUWAIT JOURNAL OF SCIENCE, 2023, 50 (02)
[7] Fast Robust Model Selection in Large Datasets
Dupuis, Debbie J.
Victoria-Feser, Maria-Pia
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (493) : 203 - 212
[8] Using bitmap index for interactive exploration of large datasets
Wu, KS
Koegler, W
Chen, J
Shoshani, A
[J]. SSDBM 2002: 15TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2003, : 65 - 74
[9] Topic Summary Views for Exploration of Large Scholarly Datasets
Castano, Silvana
Ferrara, Alfio
Montanelli, Stefano
[J]. JOURNAL ON DATA SEMANTICS, 2018, 7 (03) : 155 - 170
[10] INTEGRATIVE EXPLORATION OF LARGE HIGH-DIMENSIONAL DATASETS
Pardy, Christopher
Galbraith, Sally
Wilson, Susan R.
[J]. ANNALS OF APPLIED STATISTICS, 2018, 12 (01): : 178 - 199

← 1 2 3 4 5 →