Adaptive incremental transfer learning for efficient performance modeling of big data workloads

被引:0
|
作者
Garralda-Barrio, Mariano [1 ]
Eiras-Franco, Carlos [1 ]
Bolon-Canedo, Veronica [1 ]
机构
[1] Univ A Coruna, CITIC, La Coruna, Spain
关键词
Performance modeling; Big data; Machine learning; Apache spark; Distributed computing;
D O I
10.1016/j.future.2025.107730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Utilizing NVDIMM to alleviate the I/O Performance Gap for Big Data Workloads
    Shao, Zili
    2017 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2017,
  • [22] Data Efficient Lithography Modeling with Residual Neural Networks and Transfer Learning
    Lin, Yibo
    Watanabe, Yuki
    Kimura, Taiki
    Matsunawa, Tetsuaki
    Nojima, Shigeki
    Li, Meng
    Pan, David Z.
    PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN (ISPD'18), 2018, : 82 - 89
  • [23] EarnCache: Self-adaptive Incremental Caching for Big Data Applications
    Luo, Yifeng
    Guo, Junshi
    Zhou, Shuigeng
    WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 379 - 393
  • [24] Utilizing NVDIMM to alleviate the I/O Performance Gap for Big Data Workloads
    Shao, Zili
    2017 INTERNATIONAL SYMPOSIUM ON VLSI TECHNOLOGY, SYSTEMS AND APPLICATION (VLSI-TSA), 2017,
  • [25] Adaptive online incremental learning for evolving data streams
    Zhang, Si -si
    Liu, Jian-wei
    Zuo, Xin
    APPLIED SOFT COMPUTING, 2021, 105
  • [26] Performance Prediction of Big Data Transfer Through Experimental Analysis and Machine Learning
    Yun, Daqing
    Liu, Wuji
    Wu, Chase Q.
    Rao, Nageswara S., V
    Kettimuthu, Rajkumar
    2020 IFIP NETWORKING CONFERENCE AND WORKSHOPS (NETWORKING), 2020, : 181 - 189
  • [27] Efficient learning from big data for cancer risk modeling: A case study with melanoma
    Richter, Aaron N.
    Khoshgoftaar, Taghi M.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 110 : 29 - 39
  • [28] Efficient finer-grained incremental processing with MapReduce for big data
    Zhang, Liang
    Feng, Yuanyuan
    Shen, Peiyi
    Zhu, Guangming
    Wei, Wei
    Song, Juan
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 80 : 102 - 111
  • [29] Representative Data Selection for Efficient Medical Incremental Learning
    Wei, Bo-Quan
    Chen, Jen-Jee
    Tseng, Yu-Chee
    Kuo, Po-Tsun Paul
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [30] An Efficient Adaptive Graph Anonymization Framework For Incremental Data Publication
    Yue, Rong
    Li, YiDong
    Wang, Tao
    Jin, Yi
    2018 5TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC, AND SOCIO-CULTURAL COMPUTING (BESC), 2018, : 103 - 108