iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems

被引:11
|
作者
Wadhwa, Bharti [1 ]
Paul, Arnab K. [1 ]
Neuwirth, Sarah [2 ]
Wang, Feiyi [3 ]
Oral, Sarp [3 ]
Butt, Ali R. [1 ]
Bernard, Jon [1 ]
Cameron, Kirk W. [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Heidelberg Univ, Heidelberg, Germany
[3] Oak Ridge Natl Lab, Oak Ridge, TN USA
关键词
D O I
10.1109/IPDPS.2019.00070
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing system-wide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an "end-to-end control plane" where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages real-time load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results show read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art.
引用
收藏
页码:610 / 620
页数:11
相关论文
共 50 条
  • [1] Automatic and Transparent Resource Contention Mitigation for Improving Large-scale Parallel File System Performance
    Neuwirth, Sarah
    Wang, Feiyi
    Oral, Sarp
    Bruening, Ulrich
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 604 - 613
  • [2] Scheduling parallel processes and load balancing in large-scale computing systems
    Kutepov, V. P.
    DCABES 2007 Proceedings, Vols I and II, 2007, : 444 - 448
  • [3] A load balancing parallel algorithm for solving large-scale tridiagonal linear systems
    Tian, Min
    Qiao, Shan
    Wang, Junjie
    Du, Wei
    INTERNATIONAL CONFERENCE ON ALGORITHMS, HIGH PERFORMANCE COMPUTING, AND ARTIFICIAL INTELLIGENCE (AHPCAI 2021), 2021, 12156
  • [4] Load balancing in large-scale RFID systems
    Dong, Qunfeng
    Shukla, Ashutosh
    Shrivastava, Vivek
    Agrawal, Dheeraj
    Banerjee, Suman
    Kar, Koushik
    COMPUTER NETWORKS, 2008, 52 (09) : 1782 - 1796
  • [5] Load balancing in large-scale RFID systems
    Dong, Qunfeng
    Shukla, Ashutosh
    Shrivastava, Vivek
    Agrawal, Dheeraj
    Baneriee, Suman
    Kar, Koushik
    INFOCOM 2007, VOLS 1-5, 2007, : 2281 - +
  • [6] Efficient Load Balancing In Large-Scale Systems
    Mukherjee, D.
    Borst, S. C.
    van Leeuwaarden, J. S. H.
    Whiting, P. A.
    2016 ANNUAL CONFERENCE ON INFORMATION SCIENCE AND SYSTEMS (CISS), 2016,
  • [7] Load balancing in large-scale heterogeneous systems
    Borst, Sem
    QUEUEING SYSTEMS, 2022, 100 (3-4) : 397 - 399
  • [8] Load balancing in large-scale heterogeneous systems
    Sem Borst
    Queueing Systems, 2022, 100 : 397 - 399
  • [9] A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers
    Dong, Bin
    Li, Xiuqiao
    Wu, Qimeng
    Xiao, Limin
    Ruan, Li
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (10) : 1254 - 1268
  • [10] Tarazu: An Adaptive End-to-end I/O Load-balancing Framework for Large-scale Parallel File Systems
    Paul, Arnab K.
    Neuwirth, Sarah
    Wadhwa, Bharti
    Wang, Feiyi
    Oral, Sarp
    Butt, Ali R.
    ACM TRANSACTIONS ON STORAGE, 2024, 20 (02)