Atrak: a MapReduce-based data warehouse for big data

被引：0

作者：

Mohammadhossein Barkhordari

Mahdi Niamanesh

机构：

[1] Advance Information System Research Group for Information and Communication Technology Research Centre,

来源：

The Journal of Supercomputing | 2017年 / 73卷

关键词：

Big data; MapReduce; Data warehouse; Data locality;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

As warehouse data volumes expand, single-node solutions can no longer analyze the immense volume of data. Therefore, it is necessary to use shared nothing architectures such as MapReduce. Inter-node data segmentation in MapReduce creates node connectivity issues, network congestion, improper use of node memory capacity and inefficient processing power. In addition, it is not possible to change dimensions and measures without changing previously stored data and big dimension management. In this paper, a method called Atrak is proposed, which uses a unified data format to make Mapper nodes independent to solve the data management problem mentioned earlier. The proposed method can be applied to star schema data warehouse models with distributive measures. Atrak increases query execution speed by employing node independence and the proper use of MapReduce. The proposed method was compared to established methods such as Hive, Spark-SQL, HadoopDB and Flink. Simulation results confirm improved query execution speed of the proposed method. Using data unification in MapReduce can be used in other fields, such as data mining and graph processing.

引用

页码：4596 / 4610

页数：14

共 50 条

[21] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
Jiang, Hai
Chen, Yi
Qiao, Zhi
Weng, Tien-Hsiung
Li, Kuan-Ching
[J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 369 - 383
[22] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
Sardar T.H.
Ansari Z.
[J]. Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
[23] MapReduce-Based Complex Big Data Analytics over Uncertain and Imprecise Social Networks
Braun, Peter
Cuzzocrea, Alfredo
Jiang, Fan
Leung, Carson Kai-Sang
Pazdor, Adam G. M.
[J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 130 - 145
[24] MapReduce-Based D_ELT Framework to Address the Challenges of Geospatial Big Data
Jo, Junghee
Lee, Kang-Woo
[J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (11)
[25] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
Hai Jiang
Yi Chen
Zhi Qiao
Tien-Hsiung Weng
Kuan-Ching Li
[J]. Cluster Computing, 2015, 18 : 369 - 383
[26] LandQυ2: A MapReduce-Based System for Processing Arable Land Quality Big Data
Yao, Xiaochuang
Mokbel, Mohamed E.
Ye, Sijing
Li, Guoqing
Alarabi, Louai
Eldawy, Ahmed
Zhao, Zuliang
Zhao, Long
Zhu, Dehai
[J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (07)
[27] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
Liu, Yang
Ma, Chenxiao
Xu, Lixiong
Shen, Xiaodong
Li, Maozhen
Li, Pengcheng
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
[28] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
Pan, Jie
Magoules, Frederic
Le Biannic, Yann
[J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
[29] Gaussian relevance vector MapReduce-based annealed Glowworm optimization for big medical data scheduling
Patan, Rizwan
Kallam, Suresh
Gandomi, Amir H.
Hanne, Thomas
Ramachandran, Manikandan
[J]. JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2022, 73 (10) : 2204 - 2215
[30] A MapReduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
Xia, Dawen
Li, Huaqing
Wang, Binfeng
Li, Yantao
Zhang, Zili
[J]. IEEE ACCESS, 2016, 4 : 2920 - 2934

← 1 2 3 4 5 →