iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems

被引:11
|
作者
Wadhwa, Bharti [1 ]
Paul, Arnab K. [1 ]
Neuwirth, Sarah [2 ]
Wang, Feiyi [3 ]
Oral, Sarp [3 ]
Butt, Ali R. [1 ]
Bernard, Jon [1 ]
Cameron, Kirk W. [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Heidelberg Univ, Heidelberg, Germany
[3] Oak Ridge Natl Lab, Oak Ridge, TN USA
关键词
D O I
10.1109/IPDPS.2019.00070
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing system-wide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an "end-to-end control plane" where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages real-time load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results show read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art.
引用
收藏
页码:610 / 620
页数:11
相关论文
共 50 条
  • [11] Load Balancing in Large-Scale Systems with Multiple Dispatchers
    van der Boor, Mark
    Borst, Sem
    van Leeuwaarden, Johan
    IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [12] Parallel load-balancing for combustion with spray for large-scale simulation
    Thari, A.
    Treleaven, N. C. W.
    Staufer, M.
    Page, G. J.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 434
  • [13] NVMCache: Wear-Aware Load Balancing NVM-based Caching for Large-Scale Storage Systems
    Cai, Zhenhua
    Lin, Jiayun
    Liu, Fang
    Chen, Zhiguang
    Li, Hongtao
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 657 - 665
  • [14] A LOAD BALANCING MECHANISM FOR LARGE-SCALE MULTIPROCESSOR SYSTEMS AND ITS IMPLEMENTATION
    TAKEDA, Y
    NAKASHIMA, H
    MASUDA, K
    CHIKAYAMA, T
    TAKI, K
    NEW GENERATION COMPUTING, 1990, 7 (2-3) : 179 - 195
  • [15] LSQ: Load Balancing in Large-Scale Heterogeneous Systems With Multiple Dispatchers
    Vargaftik, Shay
    Keslassy, Isaac
    Orda, Ariel
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2020, 28 (03) : 1186 - 1198
  • [16] A probability-based load balancing algorithm for parallel file systems
    Li, Yong
    Feng, Dan
    Shi, Zhan
    Zheng, Ying
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2015, 38 (06) : 811 - 820
  • [17] Quantifying the Effects of Contention on Parallel File Systems
    Wright, Steven A.
    Jarvis, Stephen A.
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 932 - 940
  • [18] Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers
    Zhou, Xingyu
    Shroff, Ness
    Wierman, Adam
    PERFORMANCE EVALUATION, 2021, 145
  • [19] Asymptotic Optimality of Power-of-d Load Balancing in Large-Scale Systems
    Mukherjee, Debankur
    Borst, Sem C.
    van Leeuwaarden, Johan S. H.
    Whiting, Philip A.
    MATHEMATICS OF OPERATIONS RESEARCH, 2020, 45 (04) : 1535 - 1571
  • [20] Asymptotically Optimal Load Balancing in Large-scale Heterogeneous Systems with Multiple Dispatchers
    Zhou X.
    Shroff N.
    Wierman A.
    Performance Evaluation Review, 2021, 48 (03): : 57 - 58