A Survey on Data-driven Performance Tuning for Big Data Analytics Platforms

被引:8
|
作者
Costa, Rogerio Luis de C. [1 ,2 ]
Moreira, Jose [2 ,3 ]
Pintor, Paulo [3 ]
dos Santos, Veronica [4 ]
Lifschitz, Sergio [4 ]
机构
[1] Polytech Leiria, Comp Sci & Commun Res Ctr CIIC, P-2411901 Leiria, Portugal
[2] Univ Aveiro, Inst Elect & Informat Engn IEETA, P-3810193 Aveiro, Portugal
[3] Univ Aveiro, Dept Eletron Telecommun & Informat DETI, P-3810193 Aveiro, Portugal
[4] Pontificia Univ Catolica Rio de Janeiro PUC Rio, Dept Informat, BR-22451900 Rio De Janeiro, RJ, Brazil
关键词
Big data systems; Big data platforms; Performance tuning; Database systems; DATA SYSTEMS; ARCHITECTURE; FRAMEWORK; EFFICIENT; DESIGN; SPARK; CHALLENGES; MANAGEMENT; INTERNET; ENGINE;
D O I
10.1016/j.bdr.2021.100206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many research works deal with big data platforms looking forward to data science and analytics. These are complex and usually distributed environments, composed of several systems and tools. As expected, there is a need for a closer look at performance issues. In this work, we review performance tuning strategies in the big data environment. We focus on data driven tuning techniques, discussing the use of database inspired approaches. Concerning big data and NoSQL stores, performance tuning issues are quite different from the so-called conventional systems. Many existing solutions are mostly ad-hoc activities that do not fit for multiple situations. But there are some categories of data-driven solutions that can be taken as guidelines and incorporated into generalpurpose auto-tuning modules for big data systems. We examine typical performance tuning actions, discussing available solutions to support some of the tuning process's primary activities. We also discuss recent implementations of data-driven performance tuning solutions for big data platforms. We propose an initial classification based on the domain state-ofthe-art and present selected tuning actions for large-scale data processing systems. Finally, we organized existing works towards self-tuning big data systems based on this classification and presented general and system-specific tuning recommendations. We found that most of the literature pieces evaluate the use of tuning actions at the physical design perspective, and there is a lack of self-tuning machine learning-based solutions for big data systems. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Performance Evaluation of Data-driven Intelligent Algorithms for Big data Ecosystem
    Muhammad Junaid
    Sajid Ali
    Isma Farah Siddiqui
    Choonsung Nam
    Nawab Muhammad Faseeh Qureshi
    Jaehyoun Kim
    Dong Ryeol Shin
    [J]. Wireless Personal Communications, 2022, 126 : 2403 - 2423
  • [32] Smart Cities and Big Data Analytics: A Data-Driven Decision-Making Use Case
    Osman, Ahmed M. Shahat
    Elragal, Ahmed
    [J]. SMART CITIES, 2021, 4 (01): : 286 - 313
  • [33] Extracting Prominent Aspects of Online Customer Reviews: A Data-Driven Approach to Big Data Analytics
    Ali, Noaman M.
    Alshahrani, Abdullah
    Alghamdi, Ahmed M.
    Novikov, Boris
    [J]. ELECTRONICS, 2022, 11 (13)
  • [34] Developing a data analytics toolbox for data-driven product planning: A review and survey methodology
    Panzner, Melina
    Von Enzberg, Sebastian
    Dumitrescu, Roman
    [J]. Artificial Intelligence for Engineering Design, Analysis and Manufacturing: AIEDAM, 2024, 38
  • [35] Expert Review on Big Data Analytics Implementation Model in Data-driven Decision-Making
    Adrian, Cecilia
    Abdullah, Rusli
    Atan, Rodziah
    Jusoh, Yusmadi Yah
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 13 - 17
  • [36] Data-driven analytics provide novel approach to performance diagnosis
    Carpenter, Chris
    [J]. JPT, Journal of Petroleum Technology, 2019, 71 (10): : 62 - 64
  • [37] Data-driven analytics for benchmarking and optimizing the performance of automotive dealerships
    Almohri, Haidar
    Chinnam, Ratna Babu
    Colosimo, Mark
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2019, 213 : 69 - 80
  • [38] Framework for Data Analytics in Data-Driven Product Planning
    Massmann, Melina
    Meyer, Maurice
    Frank, Maximilian
    von Enzberg, Sebastian
    Kuehn, Arno
    Dumitrescu, Roman
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM-INTEGRATED INTELLIGENCE (SYSINT 2020): SYSTEM-INTEGRATED INTELLIGENCE - INTELLIGENT, FLEXIBLE AND CONNECTED SYSTEMS IN PRODUCTS AND PRODUCTION, 2020, 52 : 350 - 355
  • [39] Big data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions
    Ikegwu, Anayo Chukwu
    Nweke, Henry Friday
    Anikwe, Chioma Virginia
    Alo, Uzoma Rita
    Okonkwo, Obikwelu Raphael
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (05): : 3343 - 3387
  • [40] Big data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions
    Anayo Chukwu Ikegwu
    Henry Friday Nweke
    Chioma Virginia Anikwe
    Uzoma Rita Alo
    Obikwelu Raphael Okonkwo
    [J]. Cluster Computing, 2022, 25 : 3343 - 3387