A comparative study of cluster-based Big Data Cube implementations

被引:3
|
作者
Morielo Caetano, Andre Francisco [1 ]
Hirata, Celso Massaki [1 ]
Silva, Rodrigo Rocha [2 ,3 ,4 ]
机构
[1] Inst Tecnol Aeronout, Marechal Eduardo Gomes Sq 50, Sao Jose Dos Campos, Brazil
[2] Fac Tecnol Estado Sao Paulo, Carlos Barattino St 908, Mogi Das Cruzes, SP, Brazil
[3] Univ Coimbra, Paula Souza Ctr, Polo 2 Pinhal Marrocos, Coimbra, Portugal
[4] Univ Coimbra, Ctr Informat & Syst, Dept Informat Engn, Polo 2 Pinhal Marrocos, Coimbra, Portugal
关键词
Datacube; OLAP; Cloud; Big Data; Survey; Distributed; Parallel; COMPUTATION; SPARK; MPI;
D O I
10.1016/j.future.2022.03.024
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Research on Data Cubes scalability is extensive, yet sparse. Scalable design patterns for Data Cube implementations are a trend as the technology shifts from centralized and fully materialized models to distributed and partially materialized ones. The implementations explore cheaper and upgraded hardware in clusters of computer nodes. It is a common understanding that the parallel and distributed hardware enables to handle large amounts of multidimensional data for online analytical processing, up to billions of tuples or more, with increased performance and fault tolerance. However, the number of research works and their heterogeneity may overwhelm new initiatives in this field, as there is little discussion regarding the state-of-the-art and ways for improvement. Moreover, the baseline for comparison in most works is often too limited and requires that the reader crosscheck the information among many articles to identify possible gaps. In order to help identifying these gaps, we analyzed the works on Data Cube scalability and elaborated a comparative study that provides directions for new research on the parallel and distributed implementations of data cubes. We identified some features for comparison that include cube function, implementation technology, cube storage type, and various experiments information. We expect that the features and comparisons help researchers to identify research gaps and pave ways for future works on the field. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:240 / 253
页数:14
相关论文
共 50 条
  • [31] Data Management and Visualization for Cluster-Based Grid Operations
    Leksawat, S.
    Schmelter, A.
    Ortjohann, E.
    Premgamone, T.
    Holtschulte, D.
    Kortenbruck, J.
    Morton, D.
    2017 6TH INTERNATIONAL CONFERENCE ON CLEAN ELECTRICAL POWER (ICCEP): RENEWABLE ENERGY IMPACT, 2017, : 223 - 228
  • [32] Elastic Data Routing in Cluster-based Deduplication Systems
    Wang, Yufeng
    Tang, Shaojie
    Tan, Chiu C.
    2014 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2014, : 117 - 118
  • [33] Practical Data Transmission in Cluster-Based Sensor Networks
    Kim, Dae-Young
    Cho, Jinsung
    Jeong, Byeong-Soo
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2010, 4 (03): : 224 - 242
  • [34] Linguistic and Graphical Explanation of a Cluster-Based Data Structure
    Smits, Gregory
    Pivert, Olivier
    SCALABLE UNCERTAINTY MANAGEMENT (SUM 2015), 2015, 9310 : 186 - 200
  • [35] Preserving Privacy of Outsourced Data: A Cluster-Based Approach
    Sayi, T. J. V. R. K. M. K.
    Krishna, R. K. N. Sai
    Mukkamala, R.
    Baruah, P. K.
    2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 215 - 223
  • [36] Cluster-based sampling approaches to imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 427 - 436
  • [37] Optimizing data aggregation for cluster-based internet services
    Chu, LK
    Tang, H
    Yang, T
    Shen, K
    ACM SIGPLAN NOTICES, 2003, 38 (10) : 119 - 130
  • [38] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [39] VeSCA: Vehicular Stable Cluster-based Data Aggregation
    Ucar, Seyhan
    Ergen, Sinem Coleri
    Ozkasap, Oznur
    2014 INTERNATIONAL CONFERENCE ON CONNECTED VEHICLES AND EXPO (ICCVE), 2014, : 1080 - 1085
  • [40] A Cluster-Based Data Routing for Wireless Sensor Networks
    Wang, Hao-Li
    Chao, Yu-Yang
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2009, 5574 : 129 - 136