Adaptive incremental transfer learning for efficient performance modeling of big data workloads

被引：0

作者：

Garralda-Barrio, Mariano ^{[1
]}

Eiras-Franco, Carlos ^{[1
]}

Bolon-Canedo, Veronica ^{[1
]}

机构：

[1] Univ A Coruna, CITIC, La Coruna, Spain

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2025年 / 166卷

关键词：

Performance modeling; Big data; Machine learning; Apache spark; Distributed computing;

D O I：

10.1016/j.future.2025.107730

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.

引用

页数：17

共 50 条

[31] Efficient Machine Learning for Big Data: A Review
Al-Jarrah, Omar Y.
Yoo, Paul D.
Muhaidat, Sami
Karagiannidis, George K.
Taha, Kamal
BIG DATA RESEARCH, 2015, 2 (03) : 87 - 93
[32] Quantifying the Performance Impact of Large Pages on In-Memory Big-Data Workloads
Park, Jinsu
Han, Myeonggyun
Baek, Woongki
PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2016, : 209 - 218
[33] Adaptive framework for deep learning based dynamic and temporal topic modeling from big data
Pathak A.R.
Pandey M.
Rautaray S.
Recent Patents on Engineering, 2020, 14 (03): : 394 - 402
[34] Incremental Deep Computation Model for Wireless Big Data Feature Learning
Zhang, Qingchen
Yang, Laurence T.
Chen, Zhikui
Li, Peng
IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 248 - 257
[35] Towards Efficient NVDIMM-based Heterogeneous Storage Hierarchy Management for Big Data Workloads
Chen, Renhai
Shao, Zili
Liu, Duo
Feng, Zhiyong
Li, Tao
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 849 - 860
[36] Efficient and Secure Transfer, Synchronization, and Sharing of Big Data
Chard, Kyle
Tuecke, Steven
Foster, Ian
IEEE CLOUD COMPUTING, 2014, 1 (03) : 46 - 55
[37] Incremental fuzzy learning algorithms in big data problems: a study on the size of learning subsets
Romero-Zaliz, Rocio
Gonzalez, Antonio
Perez, Raul
2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
[38] A PERFORMANCE MODELING LANGUAGE FOR BIG DATA ARCHITECTURES
Barbierato, Enrico
Gribaudo, Marco
Iacono, Mauro
PROCEEDINGS 27TH EUROPEAN CONFERENCE ON MODELLING AND SIMULATION ECMS 2013, 2013, : 511 - +
[39] Survey of Performance Modeling of Big Data Applications
Pattanshetti, Tanuja
Attar, Vahida
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), 2017, : 177 - 181
[40] d-Simplexed: Adaptive Delaunay Triangulation or Performance Modeling and Prediction on Big Data Analytics
Chen, Yuxing
Goetsch, Peter
Hoque, Mohammad A.
Lu, Jiaheng
Tarkoma, Sasu
IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) : 458 - 469

← 1 2 3 4 5 →