Adaptive incremental transfer learning for efficient performance modeling of big data workloads

被引:0
|
作者
Garralda-Barrio, Mariano [1 ]
Eiras-Franco, Carlos [1 ]
Bolon-Canedo, Veronica [1 ]
机构
[1] Univ A Coruna, CITIC, La Coruna, Spain
关键词
Performance modeling; Big data; Machine learning; Apache spark; Distributed computing;
D O I
10.1016/j.future.2025.107730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Efficient Machine Learning for Big Data: A Review
    Al-Jarrah, Omar Y.
    Yoo, Paul D.
    Muhaidat, Sami
    Karagiannidis, George K.
    Taha, Kamal
    BIG DATA RESEARCH, 2015, 2 (03) : 87 - 93
  • [32] Quantifying the Performance Impact of Large Pages on In-Memory Big-Data Workloads
    Park, Jinsu
    Han, Myeonggyun
    Baek, Woongki
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2016, : 209 - 218
  • [33] Adaptive framework for deep learning based dynamic and temporal topic modeling from big data
    Pathak A.R.
    Pandey M.
    Rautaray S.
    Recent Patents on Engineering, 2020, 14 (03): : 394 - 402
  • [34] Incremental Deep Computation Model for Wireless Big Data Feature Learning
    Zhang, Qingchen
    Yang, Laurence T.
    Chen, Zhikui
    Li, Peng
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 248 - 257
  • [35] Towards Efficient NVDIMM-based Heterogeneous Storage Hierarchy Management for Big Data Workloads
    Chen, Renhai
    Shao, Zili
    Liu, Duo
    Feng, Zhiyong
    Li, Tao
    MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 849 - 860
  • [36] Efficient and Secure Transfer, Synchronization, and Sharing of Big Data
    Chard, Kyle
    Tuecke, Steven
    Foster, Ian
    IEEE CLOUD COMPUTING, 2014, 1 (03) : 46 - 55
  • [37] Incremental fuzzy learning algorithms in big data problems: a study on the size of learning subsets
    Romero-Zaliz, Rocio
    Gonzalez, Antonio
    Perez, Raul
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [38] A PERFORMANCE MODELING LANGUAGE FOR BIG DATA ARCHITECTURES
    Barbierato, Enrico
    Gribaudo, Marco
    Iacono, Mauro
    PROCEEDINGS 27TH EUROPEAN CONFERENCE ON MODELLING AND SIMULATION ECMS 2013, 2013, : 511 - +
  • [39] Survey of Performance Modeling of Big Data Applications
    Pattanshetti, Tanuja
    Attar, Vahida
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), 2017, : 177 - 181
  • [40] d-Simplexed: Adaptive Delaunay Triangulation or Performance Modeling and Prediction on Big Data Analytics
    Chen, Yuxing
    Goetsch, Peter
    Hoque, Mohammad A.
    Lu, Jiaheng
    Tarkoma, Sasu
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) : 458 - 469