Adaptive incremental transfer learning for efficient performance modeling of big data workloads

被引:0
|
作者
Garralda-Barrio, Mariano [1 ]
Eiras-Franco, Carlos [1 ]
Bolon-Canedo, Veronica [1 ]
机构
[1] Univ A Coruna, CITIC, La Coruna, Spain
关键词
Performance modeling; Big data; Machine learning; Apache spark; Distributed computing;
D O I
10.1016/j.future.2025.107730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Learning-based Characterizing and Modeling Performance Bottlenecks of Big Data Workloads
    Guo, Zhongxin
    Hu, Zheng
    Zhang, Chunhong
    Pu, Youer
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 860 - 867
  • [2] Fast Modeling of Analytics Workloads for Big Data Services
    Yang, Lin
    Li, Changsheng
    Fan, Liya
    Xu, Jingmin
    PROCEEDINGS 2014 INTERNATIONAL CONFERENCE ON SERVICE SCIENCES (ICSS 2014), 2014, : 101 - 105
  • [3] Ensemble diagnosis method based on transfer learning and incremental learning towards mechanical big data
    Wang, Jianyu
    Mo, Zhenling
    Zhang, Heng
    Miao, Qiang
    MEASUREMENT, 2020, 155
  • [4] Adaptive Performance Modeling of Data-intensive Workloads for Resource Provisioning in Virtualized Environment
    Makrani, Hosein Mohamamdi
    Sayadi, Hossein
    Nazari, Najmeh
    Dinakarrao, Sai Mnoj Pudukotai
    Sasan, Avesta
    Mohsenin, Tinoosh
    Rafatirad, Setareh
    Homayoun, Houman
    ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2020, 5 (04)
  • [5] Performance Characterization and Acceleration of Big Data Workloads on OpenPOWER System
    Lu, Xiaoyi
    Shi, Haiyang
    Shankar, Dipti
    Panda, Dhabaleswar K.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 213 - 222
  • [6] Adaptive knowledge transfer for class incremental learning
    Feng, Zhikun
    Zhou, Mian
    Gao, Zan
    Stefanidis, Angelos
    Su, Jionglong
    Dang, Kang
    Li, Chuanhui
    PATTERN RECOGNITION LETTERS, 2024, 183 : 165 - 171
  • [7] Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection
    Li, Yibo
    Li, Meng
    Watanabe, Yuki
    Kimura, Taiki
    Matsunawa, Tetsuaki
    Nojima, Shigeki
    Pan, David Z.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (10) : 1900 - 1913
  • [8] Efficient Data Transfer Protocols for Big Data
    Tierney, Brian
    Kissel, Ezra
    Swany, Martin
    Pouyoul, Eric
    2012 IEEE 8TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2012,
  • [9] Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing
    Lim, JongBeom
    Ahnh, Jong-Suk
    Lee, Kang-Woo
    ADVANCED SCIENCE LETTERS, 2016, 22 (09) : 2314 - 2319
  • [10] Incremental learning framework for mining big data stream
    Eisa, Alaa
    EL-Rashidy, Nora
    Alshehri, Mohammad Dahman
    El-Bakry, Hazem M.
    Abdelrazek, Samir
    Computers, Materials and Continua, 2022, 71 (02): : 2901 - 2921