Range CUBE: Efficient cube computation by exploiting data correlation

被引:18
|
作者
Feng, Y [1 ]
Agrawal, D [1 ]
El Abbadi, A [1 ]
Metwally, A [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
关键词
D O I
10.1109/ICDE.2004.1320035
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a triple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.
引用
收藏
页码:658 / 669
页数:12
相关论文
共 50 条
  • [1] The computation of semantic data cube
    Liu, YB
    Yin, J
    [J]. GRID AND COOPERATIVE COMPUTING - GCC 2005, PROCEEDINGS, 2005, 3795 : 573 - 578
  • [2] Efficient Computation of Iceberg Quotient Cube by Bounding
    Wang, Xinbao
    Zheng, Yongqing
    Luo, Chen
    Teng, Fang
    [J]. 2008 3RD INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND APPLICATIONS, VOLS 1 AND 2, 2008, : 426 - +
  • [3] MRDataCube: Data Cube Computation Using MapReduce
    Lee, Suan
    Jo, Sunhwa
    Kim, Jinho
    [J]. 2015 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2015, : 95 - 102
  • [4] On scalable parallel computation of multidimensional data cube
    Goil, S
    Choudhary, A
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1155 - 1162
  • [5] A Workload Assignment Strategy for Efficient ROLAP Data Cube Computation in Distributed Systems
    Suh, Ilhyun
    Chung, Yon Dohn
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2016, 12 (03) : 51 - 71
  • [6] An efficient processing of range-MIN/MAX queries over data cube
    Kim, DW
    Lee, EJ
    Kim, MH
    Lee, YJ
    [J]. INFORMATION SCIENCES, 1998, 112 (1-4) : 223 - 237
  • [7] Query and Data Distribution Strategy of Data Cube in Cloud Computation
    Lu, Xiaoyan
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 820 - 823
  • [8] Holistic and Algebraic Data Cube Computation Using MapReduce
    Yang, Haile
    Han, Chunyan
    [J]. 2017 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2017), VOL 2, 2017, : 47 - 50
  • [9] Parallel data cube computation on graphic processing units
    Zhou G.-L.
    Chen H.
    Li C.-P.
    Wang S.
    Zheng T.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (10): : 1788 - 1798
  • [10] A Data Cube Representation for Efficient Querying and Updating
    Phan-Luong, Viet
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 415 - 420