Performance Challenges and Solutions in Big Data Platform Hadoop

被引:0
|
作者
Singh B. [1 ,2 ]
Verma H.K. [1 ]
Madaan V. [2 ]
机构
[1] Department of Computer Science and Engineering, Dr. B.R. Ambedkar NIT, Jalandhar
[2] School of Computer Science and Engineering, Lovely Professional University, Phagwara
关键词
big data; Hadoop; load balancing; performance; scheduling; skew;
D O I
10.2174/2666255816666230608165146
中图分类号
学科分类号
摘要
Background: The present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems. Objective: The need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system. Method: A systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it. Conclusion: While extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing. © 2023 Bentham Science Publishers.
引用
收藏
相关论文
共 50 条
  • [1] Hadoop as Big Data Operating System - The Emerging Approach for Managing Challenges of Enterprise Big Data Platform
    Mazumdar, Sourav
    Dhar, Subhankar
    2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 499 - 504
  • [2] Hadoop: Addressing Challenges of Big Data
    Singh, Kamalpreet
    Kaur, Ravinder
    SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 686 - 689
  • [3] Attack Models for Big Data Platform Hadoop
    Li, Ningwei
    Gao, Hang
    Liu, Liang
    Zhang, Fan
    Wang, Wenxuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 154 - 159
  • [4] Analysis of Big Data Platform with OpenStack and Hadoop
    Li, Xiaoyan
    Lu, Zhihui
    Wang, Nini
    Wu, Jie
    Huang, Shalin
    ADVANCES IN SERVICES COMPUTING, 2016, 10065 : 375 - 390
  • [5] Performance optimization of computing task scheduling based on the Hadoop big data platform
    Li, Yang
    Hei, Xinhong
    NEURAL COMPUTING & APPLICATIONS, 2022,
  • [6] Using Hadoop on the Mainframe: A Big Solution for the Challenges of Big Data
    Seay, Cameron
    Agrawal, Rajeev
    Kadadi, Anirudh
    Barel, Yannick
    2015 12TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY - NEW GENERATIONS, 2015, : 765 - 769
  • [7] The Hadoop Technology Applies in Power Big Data Platform
    Hu, Jianyong
    Chen, Jilin
    Xie, Mei
    Gao, Bo
    Yu, Zhihong
    Yan, Jianfeng
    Lv, Ying
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL AND ELECTRICAL ENGINEERING (AMEE 2017), 2017, 87 : 113 - 116
  • [8] Power Big Data platform Based on Hadoop Technology
    Chen, Jilin
    Liu, Nana
    Chen, Yong
    Qiu, Weijiang
    PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS, ENVIRONMENT, BIOTECHNOLOGY AND COMPUTER (MMEBC), 2016, 88 : 571 - 576
  • [9] The cooperative study between the hadoop big data platform and the traditional data warehouse
    Hu, Ping
    Open Automation and Control Systems Journal, 2015, 7 (01): : 1144 - 1152
  • [10] Research on Industry Data Analysis Model Based on Hadoop Big Data Platform
    Xu, Hongsheng
    Fan, Ganglong
    Li, Ke
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, INFORMATION AND COMPUTER SCIENCE (ICEMC 2017), 2017, 73 : 783 - 787