Big data processing tools: An experimental performance evaluation

被引:13
|
作者
Rodrigues, Mario [1 ]
Santos, Maribel Yasmina [2 ]
Bernardino, Jorge [1 ,3 ]
机构
[1] Inst Engn Coimbra ISEC, Polytech Coimbra, Coimbra, Portugal
[2] Univ Minho, ALGORITMI Res Ctr, Guimaraes, Portugal
[3] Univ Coimbra, CISUC Ctr Informat, Coimbra, Portugal
关键词
Big Data; Big Data analytics; query processing; SQL-on-Hadoop;
D O I
10.1002/widm.1297
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big Data is currently a hot topic of research and development across several business areas mainly due to recent innovations in information and communication technologies. One of the main challenges of Big Data relates to how one should efficiently handle massive volumes of complex data. Due to the notorious complexity of the data that can be collected from multiple sources, usually motivated by increasing data volumes gathered at high velocity, efficient processing mechanisms are needed for data analysis purposes. Motivated by the rapid growth in technology, development of tools, and frameworks for Big Data, there is much discussion about Big Data querying tools and, specifically, those that are more appropriated for specific analytical needs. This paper describes and evaluates the following popular Big Data processing tools: Drill, HAWQ, Hive, Impala, Presto, and Spark. An experimental evaluation using the Transaction Processing Council (TPC-H) benchmark is presented and discussed, highlighting the performance of each tool, according to different workloads and query types.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Performance Evaluation of Big Data Processing Strategies for Neuroimaging
    Hayot-Sasson, Valerie
    Brown, Shawn T.
    Glatard, Tristan
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 449 - 458
  • [2] Big Data Processing Tools Navigation Diagram
    Macak, Martin
    Bangui, Hind
    Buhnova, Barbora
    Molnar, Andras J.
    Sidlo, Csaba Istvan
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 304 - 312
  • [3] Performance Evaluation and Optimization of Join Operation in Spark for Big Data Processing
    Qiu, Deyang
    Zhou, Wenli
    Liu, Jun
    [J]. PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2295 - 2298
  • [4] Abstracting Big Data Processing Tools for Smart Cities
    Magano, Fernanda de Camargo
    Braghetto, Kelly Rosa
    [J]. 2018 IEEE 37TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS WORKSHOPS (SRDSW 2018), 2018, : 14 - 17
  • [5] Performance Evaluation of Big Data Processing at the Edge for IoT-Blockchain Applications
    Lee, Zi Ee
    Chua, Raphael Liang Hui
    Keoh, Sye Loong
    Ohba, Yoshihiro
    [J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [6] Performance optimization and evaluation for parallel processing of big data in earth system models
    Wang, Yuzhu
    Hao, Huiqun
    Zhang, Junqiang
    Jiang, Jinrong
    He, Juanxiong
    Ma, Yan
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 2371 - 2381
  • [7] Big Stream Processing Systems: An Experimental Evaluation
    Shahverdi, Elkhan
    Awad, Ahmed
    Sakr, Sherif
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 53 - 60
  • [8] Performance optimization and evaluation for parallel processing of big data in earth system models
    Yuzhu Wang
    Huiqun Hao
    Junqiang Zhang
    Jinrong Jiang
    Juanxiong He
    Yan Ma
    [J]. Cluster Computing, 2019, 22 : 2371 - 2381
  • [9] Performance Evaluation of the SSD-based Swap System for Big Data Processing
    Lee, Jaehun
    Park, Sungmin
    Ryu, Minsoo
    Kang, Sooyong
    [J]. 2014 IEEE 13TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM), 2014, : 673 - 680
  • [10] Big Data Tools in Processing Information from Open Sources
    Petrova, Mariana Mateeva
    Sushchenko, Olena
    Trunina, Iryna
    Dekhtyar, Nadiya
    [J]. 2018 IEEE FIRST INTERNATIONAL CONFERENCE ON SYSTEM ANALYSIS & INTELLIGENT COMPUTING (SAIC), 2018, : 256 - 260