Towards I/O analysis of HPC systems and a generic architecture to collect access patterns

被引:11
|
作者
Wiedemann, Marc C. [1 ,2 ]
Kunkel, Julian M. [2 ]
Zimmer, Michaela [2 ]
Ludwig, Thomas [2 ]
Resch, Michael [3 ]
Boenisch, Thomas [3 ]
Wang, Xuan [3 ]
Chut, Andriy [3 ]
Aguilera, Alvaro [4 ]
Nagel, Wolfgang E. [4 ]
Kluge, Michael [4 ]
Mickler, Holger [4 ]
机构
[1] Bundesstr 45a, D-20146 Hamburg, Germany
[2] Univ Hamburg, Deutsch Klimarechenzentrum GmbH, Hamburg, Germany
[3] Univ Stuttgart, High Performance Comp Ctr Stuttgart HLRS, Stuttgart, Germany
[4] Tech Univ Dresden, Zentrum Informationsdienste & Hochleistungsrechne, Dresden, Germany
来源
关键词
I/O analysis; I/O path; Causality tree;
D O I
10.1007/s00450-012-0221-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In high-performance computing applications, a high-level I/O call will trigger activities on a multitude of hardware components. These are massively parallel systems supported by huge storage systems and internal software layers. Their complex interplay currently makes it impossible to identify the causes for and the locations of I/O bottlenecks. Existing tools indicate when a bottleneck occurs but provide little guidance in identifying the cause or improving the situation. We have thus initiated Scalable I/O for Extreme Performance to find solutions for this problem. To achieve this goal in SIOX, we will build a system to record access information on all layers and components, to recognize access patterns, and to characterize the I/O system. The system will ultimately be able to recognize the causes of the I/O bottlenecks and propose optimizations for the I/O middleware that can improve I/O performance, such as throughput rate and latency. Furthermore, the SIOX system will be able to support decision making while planning new I/O systems. In this paper, we introduce the SIOX system and describe its current status: We first outline our approach for collecting the required access information. We then provide the architectural concept, the methods for reconstructing the I/O path and an excerpt of the interface for data collection. This paper focuses especially on the architecture, which collects and combines the relevant access information along the I/O path, and which is responsible for the efficient transfer of this information. An abstract modelling approach allows us to better understand the complexity of the analysis of the I/O activities on parallel computing systems, and an abstract interface allows us to adapt the SIOX system to various HPC file systems.
引用
收藏
页码:241 / 251
页数:11
相关论文
共 50 条
  • [1] Detecting I/O Access Patterns of HPC Workloads at Runtime
    Bez, Jean Luca
    Boito, Francieli Zanon
    Nou, Ramon
    Miranda, Alberto
    Cortes, Toni
    Navaux, Philippe O. A.
    [J]. 2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 80 - 87
  • [2] I/O Access Patterns in HPC Applications: A 360-Degree Survey
    Bez, Jean Luca
    Byna, Suren
    Ibrahim, Shadi
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (02)
  • [3] An In-Depth I/O Pattern Analysis in HPC Systems
    Bang, Jiwoo
    Kim, Chungyong
    Wu, Kesheng
    Sim, Alex
    Byna, Suren
    Sung, Hanul
    Eom, Hyeonsang
    [J]. 2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 400 - 405
  • [4] Towards Self-optimization in HPC I/O
    Zimmer, Michaela
    Kunkel, Julian Martin
    Ludwig, Thomas
    [J]. SUPERCOMPUTING (ISC 2013), 2013, 7905 : 422 - 434
  • [5] Towards a Generic Control Architecture of Rescue Robot Systems
    Ali, Syed Irtiza
    Mertsching, Baerbel
    [J]. 2008 IEEE INTERNATIONAL WORKSHOP ON SAFETY, SECURITY & RESCUE ROBOTICS, 2008, : 89 - 94
  • [6] Towards Model Driven Architecture and Analysis of System of Systems Access Control
    El Hachem, Jamal
    [J]. 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 867 - 870
  • [7] Towards On-Demand I/O Forwarding in HPC Platforms
    Bez, Jean Luca
    Boito, Francieli Z.
    Miranda, Alberto
    Nou, Ramon
    Cortes, Toni
    Navaux, Philippe O. A.
    [J]. PROCEEDINGS OF 2020 IEEE/ACM FIFTH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP (PDSW 2020), 2020, : 7 - 14
  • [8] Evaluating Asynchronous Parallel I/O on HPC Systems
    Ravi, John
    Byna, Suren
    Koziol, Quincey
    Tang, Houjun
    Becchi, Michela
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 211 - 221
  • [9] Towards the design of a generic systems architecture for remote patient monitoring
    Bratan, T.
    Clarke, M.
    [J]. 2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 106 - 109
  • [10] Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems
    Ramesh, Bharath
    Hashmi, Jahanzeb Maqbool
    Xu, Shulei
    Shafi, Aamir
    Ghazimirsaeed, Mahdieh
    Bayatpour, Mohammadreza
    Subramoni, Hari
    Panda, Dhabaleswar K.
    [J]. 2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 272 - 281