Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

被引:9
|
作者
Sun, Xudong [1 ]
He, Yulin [1 ,2 ]
Wu, Dingming [1 ]
Huang, Joshua Zhexue [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518107, Peoples R China
基金
中国国家自然科学基金;
关键词
Analytical models; Costs; Computational modeling; Clustering algorithms; Distributed databases; Big Data; Programming; distributed computing frameworks; big data analysis; approximate computing; MapReduce computing model; MAP-REDUCE; MAPREDUCE; PERFORMANCE; MANAGEMENT; HADOOP; TAXONOMY; SYSTEMS;
D O I
10.26599/BDMA.2022.9020014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
引用
收藏
页码:154 / 169
页数:16
相关论文
共 50 条
  • [1] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
    Ketu, Shwet
    Mishra, Pramod Kumar
    Agarwal, Sonali
    [J]. COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
  • [2] Distributed Big Data Computing for Supporting Predictive Analytics of Service Requests
    Wang, Tianlei
    Harvey, James D.
    Leung, Carson K.
    Pazdor, Adam G. M.
    Chauhan, Animesh Singh
    Fan, Lihe
    Cuzzocrea, Alfredo
    [J]. 2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 1723 - 1728
  • [3] An experimental survey on big data frameworks
    Inoubli, Wissem
    Aridhi, Sabeur
    Mezni, Haithem
    Maddouri, Mondher
    Nguifo, Engelbert Mephu
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 546 - 564
  • [4] Distributed Computing and Inference for Big Data
    Zhou, Ling
    Gong, Ziyang
    Xiang, Pengcheng
    [J]. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2024, 11 : 533 - 551
  • [5] Time series big data: a survey on data stream frameworks, analysis and algorithms
    Almeida, Ana
    Bras, Susana
    Sargento, Susana
    Pinto, Filipe Cabral
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [6] Time series big data: a survey on data stream frameworks, analysis and algorithms
    Ana Almeida
    Susana Brás
    Susana Sargento
    Filipe Cabral Pinto
    [J]. Journal of Big Data, 10
  • [7] Survey on Big Data and Cloud Computing
    Prabha, M. Surya
    Sarojini, B.
    [J]. 2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 119 - 122
  • [8] Big Data Security Survey on Frameworks and Algorithms
    Chandra, Sudipta
    Ray, Soumya
    Goswami, R. T.
    [J]. 2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 48 - 54
  • [9] An advanced comparison on big data world computing frameworks
    Deshai, N.
    Venkataramana, S.
    Sekhar, B. V. D. S.
    Srinivas, K.
    Singh, Sundhar
    NagaKrishna, L.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER VISION AND MACHINE LEARNING, 2019, 1228
  • [10] Distributed Fuzzy Rough Set for Big Data Analysis in Cloud Computing
    Qu, Wenhao
    Kong, Linghe
    Wu, Kaishun
    Tang, Feilong
    Chen, Guihai
    [J]. 2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 109 - 116