Atrak: a MapReduce-based data warehouse for big data

被引:0
|
作者
Mohammadhossein Barkhordari
Mahdi Niamanesh
机构
[1] Advance Information System Research Group for Information and Communication Technology Research Centre,
来源
关键词
Big data; MapReduce; Data warehouse; Data locality;
D O I
暂无
中图分类号
学科分类号
摘要
As warehouse data volumes expand, single-node solutions can no longer analyze the immense volume of data. Therefore, it is necessary to use shared nothing architectures such as MapReduce. Inter-node data segmentation in MapReduce creates node connectivity issues, network congestion, improper use of node memory capacity and inefficient processing power. In addition, it is not possible to change dimensions and measures without changing previously stored data and big dimension management. In this paper, a method called Atrak is proposed, which uses a unified data format to make Mapper nodes independent to solve the data management problem mentioned earlier. The proposed method can be applied to star schema data warehouse models with distributive measures. Atrak increases query execution speed by employing node independence and the proper use of MapReduce. The proposed method was compared to established methods such as Hive, Spark-SQL, HadoopDB and Flink. Simulation results confirm improved query execution speed of the proposed method. Using data unification in MapReduce can be used in other fields, such as data mining and graph processing.
引用
收藏
页码:4596 / 4610
页数:14
相关论文
共 50 条
  • [21] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Jiang, Hai
    Chen, Yi
    Qiao, Zhi
    Weng, Tien-Hsiung
    Li, Kuan-Ching
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 369 - 383
  • [22] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    [J]. Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [23] MapReduce-Based Complex Big Data Analytics over Uncertain and Imprecise Social Networks
    Braun, Peter
    Cuzzocrea, Alfredo
    Jiang, Fan
    Leung, Carson Kai-Sang
    Pazdor, Adam G. M.
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 130 - 145
  • [24] MapReduce-Based D_ELT Framework to Address the Challenges of Geospatial Big Data
    Jo, Junghee
    Lee, Kang-Woo
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (11)
  • [25] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Hai Jiang
    Yi Chen
    Zhi Qiao
    Tien-Hsiung Weng
    Kuan-Ching Li
    [J]. Cluster Computing, 2015, 18 : 369 - 383
  • [26] LandQυ2: A MapReduce-Based System for Processing Arable Land Quality Big Data
    Yao, Xiaochuang
    Mokbel, Mohamed E.
    Ye, Sijing
    Li, Guoqing
    Alarabi, Louai
    Eldawy, Ahmed
    Zhao, Zuliang
    Zhao, Long
    Zhu, Dehai
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (07)
  • [27] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [28] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
    Pan, Jie
    Magoules, Frederic
    Le Biannic, Yann
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
  • [29] Gaussian relevance vector MapReduce-based annealed Glowworm optimization for big medical data scheduling
    Patan, Rizwan
    Kallam, Suresh
    Gandomi, Amir H.
    Hanne, Thomas
    Ramachandran, Manikandan
    [J]. JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2022, 73 (10) : 2204 - 2215
  • [30] A MapReduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
    Xia, Dawen
    Li, Huaqing
    Wang, Binfeng
    Li, Yantao
    Zhang, Zili
    [J]. IEEE ACCESS, 2016, 4 : 2920 - 2934