A Performance Study of Big Data Analytics Platforms

被引:0
|
作者
Pirzadeh, Pouria [1 ,2 ]
Carey, Michael [2 ]
Westmann, Till [3 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Univ Calif Irvine, Irvine, CA USA
[3] Couchbase Inc, Mountain View, CA USA
关键词
Big Data; Performance Evaluation; Benchmarking;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big Data analytics has become an invaluable tool in a wide variety of businesses for exploiting the wealth of Big Data that they now have access to. As a result, various solutions within different categories of Big Data systems are emerging to meet their needs. In this paper we use the TPC-H benchmark to compare the performance of four Big Data systems picked from the major categories of Big Data platforms: a commercial parallel relational database (from the traditional DBMS world), Hive and Spark SQL (from the SQL-on-Hadoop world), and AsterixDB (from the world of NoSQL systems). All of these systems have sufficiently rich query APIs and runtime systems to run TPC-H in its full form. On the other hand, the systems also have major differences in terms of their architectures, preferred storage formats, support for complex schema definitions, and approaches to query processing. This makes them a very interesting set of representative Big Data systems to compare. We present the results that we obtained through running these systems at different TPC-H scales using various settings, and we analyze a selected set of interesting query results in more detail to explore the trade-offs between performance, storage formats, and schema definitions. A follow-up discussion is included as well to summarize the lessons learned from this effort.
引用
收藏
页码:2911 / 2920
页数:10
相关论文
共 50 条
  • [1] A survey on platforms for big data analytics
    Singh D.
    Reddy C.K.
    [J]. Journal of Big Data, 2 (1)
  • [2] A Survey on Data-driven Performance Tuning for Big Data Analytics Platforms
    Costa, Rogerio Luis de C.
    Moreira, Jose
    Pintor, Paulo
    dos Santos, Veronica
    Lifschitz, Sergio
    [J]. BIG DATA RESEARCH, 2021, 25
  • [3] Big Data Analytics: A Preliminary Study of Open Source Platforms
    Nereu, Jorge
    Almeida, Ana
    Bernardino, Jorge
    [J]. ICSOFT: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2017, : 435 - 440
  • [4] Popular platforms for big data analytics: A survey
    Merrouchi, Mohamed
    Skittou, Mustapha
    Gadi, Taoufiq
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, CONTROL, OPTIMIZATION AND COMPUTER SCIENCE (ICECOCS), 2018,
  • [5] Cloud Computing Platforms for Big Data Adoption and Analytics
    Hussain, Mohammad Jabed
    Alsadie, Deafallah
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (02): : 290 - 296
  • [6] Applying intelligent data traffic adaptation to high-performance multiple big data analytics platforms
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Liao, Po-Hao
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 70 : 998 - 1018
  • [7] Big Data Platforms and Tools for Data Analytics in the Data Science Engineering Curriculum
    Demchenko, Yuri
    [J]. PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2019), 2019, : 60 - 64
  • [8] A Survey on Vertical and Horizontal Scaling Platforms for Big Data Analytics
    Ali, Ahmed Hussein
    Abdullah, Mahmood Zaki
    [J]. INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2019, 11 (06): : 138 - 150
  • [9] Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges
    Yin, Zekun
    Lan, Haidong
    Tan, Guangming
    Lu, Mian
    Vasilakos, Athanasios V.
    Liu, Weiguo
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2017, 15 : 403 - 411
  • [10] Development of Multiple Big Data Analytics Platforms with Rapid Response
    Chang, Bao Rong
    Lee, Yun-Da
    Liao, Po-Hao
    [J]. SCIENTIFIC PROGRAMMING, 2017, 2017