Picube for Fast Exploration of Large Datasets

被引:0
|
作者
Fu, Wenxiao [1 ]
机构
[1] York Univ, Toronto, ON, Canada
关键词
inductive aggregation; data-cube; partition; flexibility; efficiency; size;
D O I
10.1109/ICDE48307.2020.00246
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical aggregation supports fast exploration of large datasets by pre-aggregating data into a multi-scale data structure. While the pre-aggregation process is done offline, it can be quite expensive, and the resulting data-structure extremely large. When the data is multi-dimensional, this is greatly compounded. Data-cube-based approaches can result in extremely large cubic data-structures, aggregating more than is needed. Other approaches do not aggregate enough, and so do not offer the necessary flexibility for dimension-wise roll-ups and drill-downs. We design a hierarchical data-structure for aggregation that strikes a balance, and provides enough flexibility for different exploration scenarios with low-cost construction and reasonable size. Inductive aggregation is a methodology to compute levels of aggregations efficiently, while the resulting data-structure supports smooth data exploration. Inspired by this, we propose a partitioned, inductively aggregated data-cube, picube. A framework we call stratum space is presented with the model to express the dependencies across aggregation levels. Optimization choices are discussed, providing good design tradeoffs between storing and querying of the data.
引用
收藏
页码:2069 / 2073
页数:5
相关论文
共 50 条
  • [1] Fast exploration and classification of large hyperspectral image datasets for early bruise detection on apples
    Ferrari, Carlotta
    Foca, Giorgia
    Calvini, Rosalba
    Ulrici, Alessandro
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 146 : 108 - 119
  • [2] Online Outlier Exploration Over Large Datasets
    Cao, Lei
    Wei, Mingrui
    Yang, Di
    Rundensteiner, Elke A.
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 89 - 98
  • [3] Fast Bayesian inference for large occupancy datasets
    Diana, Alex
    Dennis, Emily Beth
    Matechou, Eleni
    Morgan, Byron John Treharne
    [J]. BIOMETRICS, 2023, 79 (03) : 2503 - 2515
  • [4] Fast approximating triangulation of large scattered datasets
    Weimer, H
    Warren, J
    [J]. ADVANCES IN ENGINEERING SOFTWARE, 1999, 30 (06) : 389 - 400
  • [5] Fast Bayesian hyperparameter optimization on large datasets
    Klein, Aaron
    Falkner, Stefan
    Bartels, Simon
    Hennig, Philipp
    Hutter, Frank
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 4945 - 4968
  • [6] Comparison of fast regression algorithms in large datasets
    Cangur, Sengul
    Ankarali, Handan
    [J]. KUWAIT JOURNAL OF SCIENCE, 2023, 50 (02)
  • [7] Fast Robust Model Selection in Large Datasets
    Dupuis, Debbie J.
    Victoria-Feser, Maria-Pia
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (493) : 203 - 212
  • [8] Using bitmap index for interactive exploration of large datasets
    Wu, KS
    Koegler, W
    Chen, J
    Shoshani, A
    [J]. SSDBM 2002: 15TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2003, : 65 - 74
  • [9] Topic Summary Views for Exploration of Large Scholarly Datasets
    Castano, Silvana
    Ferrara, Alfio
    Montanelli, Stefano
    [J]. JOURNAL ON DATA SEMANTICS, 2018, 7 (03) : 155 - 170
  • [10] INTEGRATIVE EXPLORATION OF LARGE HIGH-DIMENSIONAL DATASETS
    Pardy, Christopher
    Galbraith, Sally
    Wilson, Susan R.
    [J]. ANNALS OF APPLIED STATISTICS, 2018, 12 (01): : 178 - 199