d-Simplexed: Adaptive Delaunay Triangulation or Performance Modeling and Prediction on Big Data Analytics

被引:14
|
作者
Chen, Yuxing [1 ]
Goetsch, Peter [1 ]
Hoque, Mohammad A. [1 ]
Lu, Jiaheng [1 ]
Tarkoma, Sasu [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki 00560, Finland
基金
芬兰科学院;
关键词
Performance modeling; big data analytics; adaptive sampling; delaunay triangulation; MAPREDUCE;
D O I
10.1109/TBDATA.2019.2948338
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such as memory size, CPU allocation, and the number of running nodes. Regular users and even expert administrators struggle to understand the mutual relation between different parameter configurations and the overall performance of the system. In this paper, we address this challenge by proposing a performance prediction framework, called d-Simplexed, to build performance models with varied configurable parameters on Spark. We take inspiration from the field of Computational Geometry to construct a d-dimensional mesh using Delaunay Triangulation over a selected set of features. From this mesh, we predict execution time for various feature configurations. To minimize the time and resources in building a bootstrap model with a large number of configuration values, we propose an adaptive sampling technique to allow us to collect as few training points as required. Our evaluation on a cluster of computers using WordCount, PageRank, Kmeans, and Join workloads in HiBench benchmarking suites shows that we can achieve less than 5 percent error rate for estimation accuracy by sampling less than 1 percent of data.
引用
收藏
页码:458 / 469
页数:12
相关论文
共 50 条
  • [41] Performance Enhancement of Distributed Clustering for Big Data Analytics
    Mohamed, Omar Hesham
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    [J]. INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 415 - 425
  • [42] Exploring the performance measures of big data analytics systems
    Ali, Ismail Mohamed
    Jusoh, Yusmadi Yah
    Abdullah, Rusli
    Ahmed, Yahye Abukar
    [J]. INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2023, 10 (01): : 92 - 104
  • [43] The impact of Big Data Analytics on firm sustainable performance
    Ertz, Myriam
    Latrous, Imen
    Dakhlaoui, Ahlem
    Sun, Shouheng
    [J]. CORPORATE SOCIAL RESPONSIBILITY AND ENVIRONMENTAL MANAGEMENT, 2024,
  • [44] Prediction with Partitioning: Big Data Analytics Using Regression Techniques
    Saritha, K.
    Abraham, Sajimon
    [J]. 2017 INTERNATIONAL CONFERENCE ON NETWORKS & ADVANCES IN COMPUTATIONAL TECHNOLOGIES (NETACT), 2017, : 208 - 214
  • [45] Train Delay Prediction Systems: A Big Data Analytics Perspective
    Oneto, Luca
    Fumeo, Emanuele
    Clerico, Giorgio
    Canepa, Renzo
    Papa, Federico
    Dambra, Carlo
    Mazzino, Nadia
    Anguita, Davide
    [J]. BIG DATA RESEARCH, 2018, 11 : 54 - 64
  • [46] Review of Prediction of Disease Trends using Big Data Analytics
    Nagavci, Diellza
    Hamiti, Mentor
    Selimi, Besnik
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (08) : 46 - 50
  • [47] Big Data Analytics in Healthcare: Case Study - Miscarriage Prediction
    Asri, Hiba
    Mousannif, Hajar
    Al Moatassime, Hassan
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (04) : 45 - 58
  • [48] Big data analytics for default prediction using graph theory
    Yildirim, Mustafa
    Okay, Feyza Yildirim
    Ozdemir, Suat
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 176
  • [49] Potential of big visual data and building information modeling for construction performance analytics: An exploratory study
    Han, Kevin K.
    Golparvar-Fard, Mani
    [J]. AUTOMATION IN CONSTRUCTION, 2017, 73 : 184 - 198
  • [50] Big Data Analytics and Predictive Modeling Approaches for the Energy Sector
    Corizzo, Roberto
    Ceci, Michelangelo
    Malerba, Donato
    [J]. 2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 55 - 63