Performance Challenges and Solutions in Big Data Platform Hadoop

被引:0
|
作者
Singh B. [1 ,2 ]
Verma H.K. [1 ]
Madaan V. [2 ]
机构
[1] Department of Computer Science and Engineering, Dr. B.R. Ambedkar NIT, Jalandhar
[2] School of Computer Science and Engineering, Lovely Professional University, Phagwara
关键词
big data; Hadoop; load balancing; performance; scheduling; skew;
D O I
10.2174/2666255816666230608165146
中图分类号
学科分类号
摘要
Background: The present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems. Objective: The need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system. Method: A systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it. Conclusion: While extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing. © 2023 Bentham Science Publishers.
引用
收藏
相关论文
共 50 条
  • [31] Challenges and Solutions in Big data management - An Overview
    Kanchi, Sravanthi
    Sandilya, Shubhrika
    Ramkrishna, Shashank
    Manjrekar, Siddhesh
    Vhadgar, Akshata
    2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : 418 - 426
  • [32] Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
    Al-Absi, Ahmed Abdulhakim
    Kang, Dae-Ki
    Kim, Myong-Jong
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURE INFORMATION TECHNOLOGY, VOL 2, 2016, 354 : 9 - 15
  • [33] Design and development of real-time query platform for big data based on hadoop
    Liu, Xiaoli
    Xu, Pandeng
    Liu, Mingliang
    Zhu, Guobin
    High Technology Letters, 2015, 21 (02) : 231 - 238
  • [34] Design and development of real-time query platform for big data based on hadoop
    刘小利
    Xu Pandeng
    Liu Mingliang
    Zhu Guobin
    High Technology Letters, 2015, 21 (02) : 231 - 238
  • [35] Querying Capability Comparison of Hadoop Technologies to Find the More Sustainable Platform for Big Data
    Sharma, Upasna
    Bagga, Sachin
    Girdhar, Akshay
    2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 2896 - 2900
  • [36] Design and Implementation of Traffic Big Data Visualization Web GIS Platform Based on Hadoop
    Qiao, Zhitao
    Mu, Chen
    Sun, Jichao
    CICTP 2020: ADVANCED TRANSPORTATION TECHNOLOGIES AND DEVELOPMENT-ENHANCING CONNECTIONS, 2020, : 3101 - 3106
  • [37] Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing
    Lim, JongBeom
    Ahnh, Jong-Suk
    Lee, Kang-Woo
    ADVANCED SCIENCE LETTERS, 2016, 22 (09) : 2314 - 2319
  • [38] EverAnalyzer: A Self-Adjustable Big Data Management Platform Exploiting the Hadoop Ecosystem
    Karamolegkos, Panagiotis
    Mavrogiorgou, Argyro
    Kiourtis, Athanasios
    Kyriazis, Dimosthenis
    INFORMATION, 2023, 14 (02)
  • [39] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [40] Distributed Case-based Reasoning System Based on Big Data Platform Hadoop
    Wang, Chong-Yang
    Wang, Hong-Bing
    Liang, Yan-Rui
    2015 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION SYSTEM (SEIS 2015), 2015, : 629 - 634