Big data processing tools: An experimental performance evaluation

被引:13
|
作者
Rodrigues, Mario [1 ]
Santos, Maribel Yasmina [2 ]
Bernardino, Jorge [1 ,3 ]
机构
[1] Inst Engn Coimbra ISEC, Polytech Coimbra, Coimbra, Portugal
[2] Univ Minho, ALGORITMI Res Ctr, Guimaraes, Portugal
[3] Univ Coimbra, CISUC Ctr Informat, Coimbra, Portugal
关键词
Big Data; Big Data analytics; query processing; SQL-on-Hadoop;
D O I
10.1002/widm.1297
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big Data is currently a hot topic of research and development across several business areas mainly due to recent innovations in information and communication technologies. One of the main challenges of Big Data relates to how one should efficiently handle massive volumes of complex data. Due to the notorious complexity of the data that can be collected from multiple sources, usually motivated by increasing data volumes gathered at high velocity, efficient processing mechanisms are needed for data analysis purposes. Motivated by the rapid growth in technology, development of tools, and frameworks for Big Data, there is much discussion about Big Data querying tools and, specifically, those that are more appropriated for specific analytical needs. This paper describes and evaluates the following popular Big Data processing tools: Drill, HAWQ, Hive, Impala, Presto, and Spark. An experimental evaluation using the Transaction Processing Council (TPC-H) benchmark is presented and discussed, highlighting the performance of each tool, according to different workloads and query types.
引用
收藏
页数:24
相关论文
共 50 条
  • [11] Big Data Processing and Evaluation of Education and Teaching
    Zhang, Jianhua
    Xu, Yijing
    PROCEEDINGS OF THE 2015 JOINT INTERNATIONAL SOCIAL SCIENCE, EDUCATION, LANGUAGE, MANAGEMENT AND BUSINESS CONFERENCE (JISEM 2015), 2016, 26 : 156 - 158
  • [12] Analysis and processing of academic data from a higher institution with tools for Big Data
    Urena-Torres, Juan-Pablo
    Tenesaca-Luna, Gladys-Alicia
    Mora Arciniegas, Maria Belen
    2017 12TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2017,
  • [13] Processing Real World Datasets using Big Data Hadoop Tools
    Deshai, N.
    Sekhar, B. V. D. S.
    Reddy, P. V. G. D. Prasad
    Chakravarthy, V. V. S. S. S.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (07): : 631 - 635
  • [14] Boosting Heapsort Performance of Processing Big Data Streams
    Algemili, Usamah
    Alhudhaif, Adi
    SOUTHEASTCON 2016, 2016,
  • [15] High-Performance Computing for Big Data Processing
    Wu, Yulei
    Xiang, Yang
    Ge, Jingguo
    Muller, Peter
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 693 - 695
  • [16] A critical evaluation of handling uncertainty in Big Data processing
    Upadhyay, Ekansh
    ADVANCES IN ENGINEERING SOFTWARE, 2022, 173
  • [17] A Performance Evaluation of Classification Algorithms for Big Data
    Hai, Mo
    Zhang, You
    Zhang, Youjin
    5TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2017, 2017, 122 : 1100 - 1107
  • [18] Performance Evaluation of HDFS in Big Data Management
    Dev, Dipayan
    Patgiri, Ripon
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [19] Proteome informatics I: Bioinformatics tools for processing experimental data
    Palagi, Patricia M.
    Hernandez, Patricia
    Walther, Daniel
    Appel, Ron D.
    PROTEOMICS, 2006, 6 (20) : 5435 - 5444
  • [20] An Experimental Evaluation of Garbage Collectors on Big Data Applications
    Xu, Lijie
    Guo, Tian
    Dou, Wensheng
    Wang, Wei
    Wei, Jun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (05): : 570 - 583