Picube for Fast Exploration of Large Datasets

被引:0
|
作者
Fu, Wenxiao [1 ]
机构
[1] York Univ, Toronto, ON, Canada
关键词
inductive aggregation; data-cube; partition; flexibility; efficiency; size;
D O I
10.1109/ICDE48307.2020.00246
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical aggregation supports fast exploration of large datasets by pre-aggregating data into a multi-scale data structure. While the pre-aggregation process is done offline, it can be quite expensive, and the resulting data-structure extremely large. When the data is multi-dimensional, this is greatly compounded. Data-cube-based approaches can result in extremely large cubic data-structures, aggregating more than is needed. Other approaches do not aggregate enough, and so do not offer the necessary flexibility for dimension-wise roll-ups and drill-downs. We design a hierarchical data-structure for aggregation that strikes a balance, and provides enough flexibility for different exploration scenarios with low-cost construction and reasonable size. Inductive aggregation is a methodology to compute levels of aggregations efficiently, while the resulting data-structure supports smooth data exploration. Inspired by this, we propose a partitioned, inductively aggregated data-cube, picube. A framework we call stratum space is presented with the model to express the dependencies across aggregation levels. Optimization choices are discussed, providing good design tradeoffs between storing and querying of the data.
引用
收藏
页码:2069 / 2073
页数:5
相关论文
共 50 条
  • [31] Fast, Approximate Vector Queries on Very Large Unstructured Datasets
    Zhang, Zili
    Jin, Chao
    Tang, Linpeng
    Liu, Xuanzhe
    Jin, Xin
    [J]. PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 995 - 1011
  • [32] Phenoplant: a web resource for the exploration of large chlorophyll fluorescence image datasets
    Rousseau, Celine
    Hunault, Gilles
    Gaillard, Sylvain
    Bourbeillon, Julie
    Montiel, Gregory
    Simier, Philippe
    Campion, Claire
    Jacques, Marie-Agnes
    Belin, Etienne
    Boureau, Tristan
    [J]. PLANT METHODS, 2015, 11
  • [33] A NOVEL FAST PHASE UNWRAPPING METHOD FOR LARGE INTERFEROMETRIC DATASETS
    Wang, Zhibin
    Liu, Yanyang
    Li, Zhenfang
    Li, Jinwei
    Chen, Junli
    [J]. 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 6445 - 6448
  • [34] Distance Based Fast Hierarchical Clustering Method for Large Datasets
    Patra, Bidyut Kr.
    Hubballi, Neminath
    Biswas, Santosh
    Nandi, Sukumar
    [J]. ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2010, 6086 : 50 - 59
  • [35] Fast Support Vector Machine classification of very large datasets
    Fehr, Janis
    Arreola, Karina Zapien
    Burkhardt, Hans
    [J]. DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, : 11 - +
  • [36] Fast learning of generalized minimum enclosing ball for large datasets
    Hu, Wen-Jun
    Wang, Shi-Tong
    Wang, Juan
    Ying, Wen-Hao
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2012, 38 (11): : 1831 - 1840
  • [37] Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
    Klein, Aaron
    Falkner, Stefan
    Bartels, Simon
    Hennig, Philipp
    Hutter, Frank
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 528 - 536
  • [38] RainForest—A Framework for Fast Decision Tree Construction of Large Datasets
    Johannes Gehrke
    Raghu Ramakrishnan
    Venkatesh Ganti
    [J]. Data Mining and Knowledge Discovery, 2000, 4 : 127 - 162
  • [39] VisReduce: Fast and responsive incremental information visualization of large datasets
    Im, Jean-Francois
    Villegas, Felix Giguere
    McGuffin, Michael J.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [40] Fast Emulation of Self-Organizing Maps for Large Datasets
    Cordel, Macario O., II
    Azcarraga, Arnulfo P.
    [J]. 6TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2015), THE 5TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2015), 2015, 52 : 381 - 388