Big SQL systems: an experimental evaluation

被引:2
|
作者
Aluko, Victor [1 ]
Sakr, Sherif [1 ]
机构
[1] Univ Taru, Taru, Estonia
关键词
Big data; Big SQL; Benchmarking;
D O I
10.1007/s10586-019-02914-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, Big Data systems have been gaining increasing popularity on handling the massive amounts of data that are continuously generated in our digital world. While the Hadoop framework has pioneered the area of Big Data processing systems, it had clear performance limitations on providing the best performance of processing massive amounts of structured data. In addition, practically, many users of the big data systems face some challenges on dealing with the APIs and the low level programming abstractions of the Big Data System and they would prefer to use SQL (in which they are more proficient) as a high-level declarative language to express their tasks while leaving all of the execution optimization details to the backend engine. Thus, several systems have been designed and implemented to tackle these challenges by designing and implementing scalable query execution engines for processing massive structured data while supporting SQL interfaces. In this article, we present an extensive experimental study of four popular systems in this domain, namely, Apache Hive, SPARK SQL, Apache Impala and PrestoDB. In particular, we report and analyze the performance characteristics of these systems using three different benchmarks, namely, TPC-H, TPC-DS and TPCx-BB. Finally, we report a set of insights and important lessons that we have learned from conducting our experiments.
引用
收藏
页码:1347 / 1377
页数:31
相关论文
共 50 条
  • [1] Big SQL systems: an experimental evaluation
    Victor Aluko
    Sherif Sakr
    [J]. Cluster Computing, 2019, 22 : 1347 - 1377
  • [2] Evaluation of ACE Properties of Traditional SQL and NoSQL Big Data Systems
    Teresa Gonzalez-Aparicio, Maria
    Younas, Muhammad
    Tuya, Javier
    Casado, Ruben
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1988 - 1995
  • [3] Big Stream Processing Systems: An Experimental Evaluation
    Shahverdi, Elkhan
    Awad, Ahmed
    Sakr, Sherif
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 53 - 60
  • [4] LotusSQL: SQL Engine for High-Performance Big Data Systems
    Li, Xiaohan
    Yu, Bowen
    Feng, Guanyu
    Wang, Haojie
    Chen, Wenguang
    [J]. BIG DATA MINING AND ANALYTICS, 2021, 4 (04): : 252 - 265
  • [5] LotusSQL: SQL Engine for High-Performance Big Data Systems
    Xiaohan Li
    Bowen Yu
    Guanyu Feng
    Haojie Wang
    Wenguang Chen
    [J]. Big Data Mining and Analytics, 2021, (04) : 252 - 265
  • [6] The Performance of SQL-on-Hadoop Systems: An Experimental Study
    Qin, Xiongpai
    Chen, Yueguo
    Chen, Jun
    Li, Shuai
    Liu, Jiesi
    Zhang, Huijie
    [J]. 2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 464 - 471
  • [7] Performance evaluation of SQL and MongoDB databases for big e-commerce data
    Aboutorabi, Seyyed Hamid
    Rezapour, Mehdi
    Moradi, Milad
    Ghadiri, Nasser
    [J]. CSSE 2015 20TH INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING, 2015,
  • [8] SQL-On-Hadoop Systems: Evaluting Performance of Polybase for Big Data Processing
    Minukhin, Sergii
    Fedko, Victor
    Sitnikov, Dmytro
    [J]. 2018 INTERNATIONAL SCIENTIFIC-PRACTICAL CONFERENCE: PROBLEMS OF INFOCOMMUNICATIONS SCIENCE AND TECHNOLOGY (PIC S&T), 2018, : 591 - 594
  • [9] BIG-SISTERS - AN EXPERIMENTAL EVALUATION
    SEIDL, FW
    [J]. ADOLESCENCE, 1982, 17 (65) : 117 - 128
  • [10] An Experimental Study of Big Spatial Data Systems
    Hulbert, Andrew
    Kunicki, Thomas
    Hughes, James N.
    Fox, Anthony D.
    Eiehelberger, Christopher N.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2664 - 2671