Performance Challenges and Solutions in Big Data Platform Hadoop

被引：0

作者：

Singh B. ^{[1
,2
]}

Verma H.K. ^{[1
]}

Madaan V. ^{[2
]}

机构：

[1] Department of Computer Science and Engineering, Dr. B.R. Ambedkar NIT, Jalandhar

[2] School of Computer Science and Engineering, Lovely Professional University, Phagwara

来源：

Recent Advances in Computer Science and Communications | 2023年 / 16卷 / 09期

关键词：

big data; Hadoop; load balancing; performance; scheduling; skew;

D O I：

10.2174/2666255816666230608165146

中图分类号：

学科分类号：

摘要：

Background: The present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems. Objective: The need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system. Method: A systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it. Conclusion: While extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing. © 2023 Bentham Science Publishers.

引用

共 50 条

[31] Challenges and Solutions in Big data management - An Overview
Kanchi, Sravanthi
Sandilya, Shubhrika
Ramkrishna, Shashank
Manjrekar, Siddhesh
Vhadgar, Akshata
2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : 418 - 426
[32] Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Al-Absi, Ahmed Abdulhakim
Kang, Dae-Ki
Kim, Myong-Jong
ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURE INFORMATION TECHNOLOGY, VOL 2, 2016, 354 : 9 - 15
[33] Design and development of real-time query platform for big data based on hadoop
Liu, Xiaoli
Xu, Pandeng
Liu, Mingliang
Zhu, Guobin
High Technology Letters, 2015, 21 (02) : 231 - 238
[34] Design and development of real-time query platform for big data based on hadoop
刘小利
Xu Pandeng
Liu Mingliang
Zhu Guobin
High Technology Letters, 2015, 21 (02) : 231 - 238
[35] Querying Capability Comparison of Hadoop Technologies to Find the More Sustainable Platform for Big Data
Sharma, Upasna
Bagga, Sachin
Girdhar, Akshay
2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 2896 - 2900
[36] Design and Implementation of Traffic Big Data Visualization Web GIS Platform Based on Hadoop
Qiao, Zhitao
Mu, Chen
Sun, Jichao
CICTP 2020: ADVANCED TRANSPORTATION TECHNOLOGIES AND DEVELOPMENT-ENHANCING CONNECTIONS, 2020, : 3101 - 3106
[37] Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing
Lim, JongBeom
Ahnh, Jong-Suk
Lee, Kang-Woo
ADVANCED SCIENCE LETTERS, 2016, 22 (09) : 2314 - 2319
[38] EverAnalyzer: A Self-Adjustable Big Data Management Platform Exploiting the Hadoop Ecosystem
Karamolegkos, Panagiotis
Mavrogiorgou, Argyro
Kiourtis, Athanasios
Kyriazis, Dimosthenis
INFORMATION, 2023, 14 (02)
[39] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
Gohil, Parth
Garg, Dweepna
Panchal, Bakul
2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
[40] Distributed Case-based Reasoning System Based on Big Data Platform Hadoop
Wang, Chong-Yang
Wang, Hong-Bing
Liang, Yan-Rui
2015 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION SYSTEM (SEIS 2015), 2015, : 629 - 634

← 1 2 3 4 5 →