Adaptive incremental transfer learning for efficient performance modeling of big data workloads

被引：0

作者：

Garralda-Barrio, Mariano ^{[1
]}

Eiras-Franco, Carlos ^{[1
]}

Bolon-Canedo, Veronica ^{[1
]}

机构：

[1] Univ A Coruna, CITIC, La Coruna, Spain

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2025年 / 166卷

关键词：

Performance modeling; Big data; Machine learning; Apache spark; Distributed computing;

D O I：

10.1016/j.future.2025.107730

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.

引用

页数：17

共 50 条

[21] Utilizing NVDIMM to alleviate the I/O Performance Gap for Big Data Workloads
Shao, Zili
2017 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2017,
[22] Data Efficient Lithography Modeling with Residual Neural Networks and Transfer Learning
Lin, Yibo
Watanabe, Yuki
Kimura, Taiki
Matsunawa, Tetsuaki
Nojima, Shigeki
Li, Meng
Pan, David Z.
PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN (ISPD'18), 2018, : 82 - 89
[23] EarnCache: Self-adaptive Incremental Caching for Big Data Applications
Luo, Yifeng
Guo, Junshi
Zhou, Shuigeng
WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 379 - 393
[24] Utilizing NVDIMM to alleviate the I/O Performance Gap for Big Data Workloads
Shao, Zili
2017 INTERNATIONAL SYMPOSIUM ON VLSI TECHNOLOGY, SYSTEMS AND APPLICATION (VLSI-TSA), 2017,
[25] Adaptive online incremental learning for evolving data streams
Zhang, Si -si
Liu, Jian-wei
Zuo, Xin
APPLIED SOFT COMPUTING, 2021, 105
[26] Performance Prediction of Big Data Transfer Through Experimental Analysis and Machine Learning
Yun, Daqing
Liu, Wuji
Wu, Chase Q.
Rao, Nageswara S., V
Kettimuthu, Rajkumar
2020 IFIP NETWORKING CONFERENCE AND WORKSHOPS (NETWORKING), 2020, : 181 - 189
[27] Efficient learning from big data for cancer risk modeling: A case study with melanoma
Richter, Aaron N.
Khoshgoftaar, Taghi M.
COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 110 : 29 - 39
[28] Efficient finer-grained incremental processing with MapReduce for big data
Zhang, Liang
Feng, Yuanyuan
Shen, Peiyi
Zhu, Guangming
Wei, Wei
Song, Juan
Shah, Syed Afaq Ali
Bennamoun, Mohammed
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 80 : 102 - 111
[29] Representative Data Selection for Efficient Medical Incremental Learning
Wei, Bo-Quan
Chen, Jen-Jee
Tseng, Yu-Chee
Kuo, Po-Tsun Paul
2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
[30] An Efficient Adaptive Graph Anonymization Framework For Incremental Data Publication
Yue, Rong
Li, YiDong
Wang, Tao
Jin, Yi
2018 5TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC, AND SOCIO-CULTURAL COMPUTING (BESC), 2018, : 103 - 108

← 1 2 3 4 5 →