Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning

被引:1
|
作者
Liu, Zhangyu [1 ]
Zhang, Cheng [1 ]
Wu, Huijun [2 ]
Fang, Jianbin [2 ]
Peng, Lin [2 ]
Ye, Guixin
Tang, Zhanyong [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
HPC; Parallel I/O; Performance Optimization; Auto-tuning; Ensemble Learning;
D O I
10.1109/CLUSTER52292.2023.00027
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To improve parallel I/O performance, it is imperative to optimize the adjustable parameters across the different layers of the I/O software stack. Finding an optimal configuration for different scenarios is hampered by the complex interaction dynamics between these parameters and the large parameter space. Previous research efforts have focused on tuning these parameters using independent algorithms; however, these approaches exhibit certain shortcomings such as unstable performance results and delayed convergence rates. This paper introduces OPRAEL, an auto-tuning approach on parallel I/O tasks by ensembles and performance modeling using regression analysis. To test its effectiveness, we applied this approach on the Tianhe-II supercomputer using one well-known I/O benchmark(IOR) and two I/O kernels(S3D-I/O, BT-I/O). Leveraging our experience in predictive modeling, we optimized the tuning of the I/O stack parameters. Our experimental results show a remarkable 10.2X improvement in write performance speedup for the optimization task with BT-I/O and a 500x500x500 input. We also compared the potential of using a single search algorithm versus using reinforcement learning search in the I/O parameter auto-optimization task. Our results show that OPRAEL outperforms the traditional approach, resulting in a maximum 8.4X improvement in write performance for the 128-process IOR optimization.
引用
收藏
页码:234 / 246
页数:13
相关论文
共 50 条
  • [21] Characterizing I/O Workloads of HPC Applications Through Online Analysis
    Dong, Wenrui
    Liu, Guangming
    Yu, Jie
    Zuo, You
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [22] Spatio-temporal Analysis of HPC I/O and Connection Data
    Kim, Jinoh
    Choi, Jinhwan
    Sim, Alex
    2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1585 - 1588
  • [23] Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory
    Liu, Wei
    Wu, Kai
    Liu, Jialin
    Chen, Feng
    Li, Dong
    2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2017, : 41 - 50
  • [24] Does Varying BeeGFS Configuration Affect the I/O Performance of HPC Workloads?
    Borkar, Arnav
    Tony, Joel
    Vamsi, Hari K. N.
    Barman, Tushar
    Bhisikar, Yash
    Sreenath, T. M.
    Paul, Arnab K.
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING WORKSHOPS, CLUSTER WORKSHOPS, 2023, : 5 - 7
  • [25] HPC I/O Throughput Bottleneck Analysis with hxplainable Local Models
    Isakov, Mihailo
    del Rosario, Eliakin
    Madireddy, Sandeep
    Balaprakash, Prasanna
    Carns, Philip
    Ross, Robert B.
    Kinsy, Michel A.
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [26] HPC I/O in the Data Center Workshop (HPC-IODC)
    Kunkel, Julian M.
    Lofstead, Jay
    McMurtrie, Colin
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 116 - 120
  • [27] HPC I/O in the Data Center Workshop (HPC-IODC)
    Kunkel, Julian M.
    Lofstead, Jay
    Acquaviva, Jean-Thomas
    HIGH PERFORMANCE COMPUTING - ISC HIGH PERFORMANCE DIGITAL 2021 INTERNATIONAL WORKSHOPS, 2021, 12761 : 156 - 159
  • [28] Locally Weighted Ensemble Learning for Regression
    Yu, Man
    Xie, Zongxia
    Shi, Hong
    Hu, Qinghua
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT I, 2016, 9651 : 65 - 76
  • [29] Ensemble of Extreme Learning Machines for Regression
    Khellal, Atmane
    Ma, Hongbin
    Fei, Qing
    PROCEEDINGS OF 2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS), 2018, : 1052 - 1057
  • [30] Regression analysis of EEG signals in fatigue driving based on ensemble learning
    Dong, Na
    Zhang, Wenqi
    Wu, Zhiqiang
    Li, Yingjie
    Xu, Wenda
    Ma, Chao
    Gao, Zhongke
    EPL, 2021, 134 (05)