I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning

被引:50
|
作者
Chowdhury, Fahim [1 ]
Zhu, Yue [1 ]
Heer, Todd [2 ]
Paredes, Saul [1 ]
Moody, Adam [2 ]
Goldstone, Robin [2 ]
Mohror, Kathryn [2 ]
Yu, Weikuan [1 ]
机构
[1] Florida State Univ, Tallahassee, FL 32306 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/3337821.3337902
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel File Systems (PFSs) are frequently deployed on leadership High Performance Computing (HPC) systems to ensure efficient I/O, persistent storage and scalable performance. Emerging Deep Learning (DL) applications incur new I/O and storage requirements to HPC systems with batched input of small random files. This mandates PFSs to have commensurate features that can meet the needs of DL applications. BeeGFS is a recently emerging PFS that has grabbed the attention of the research and industry world because of its performance, scalability and ease of use. While emphasizing a systematic performance analysis of BeeGFS, in this paper, we present the architectural and system features of BeeGFS, and perform an experimental evaluation using cutting-edge I/O, Metadata and DL application benchmarks. Particularly, we have utilized AlexNet and ResNet-50 models for the classification of ImageNet dataset using the Livermore Big Artificial Neural Network Toolkit (LBANN), and ImageNet data reader pipeline atop TensorFlow and Horovod. Through extensive performance characterization of BeeGFS, our study provides a useful documentation on how to leverage BeeGFS for the emerging DL applications.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] The role of storage target allocation in applications' I/O performance with BeeGFS
    Boito, Francieli
    Pallez, Guillaume
    Teylo, Luan
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 267 - 277
  • [2] Does Varying BeeGFS Configuration Affect the I/O Performance of HPC Workloads?
    Borkar, Arnav
    Tony, Joel
    Vamsi, Hari K. N.
    Barman, Tushar
    Bhisikar, Yash
    Sreenath, T. M.
    Paul, Arnab K.
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING WORKSHOPS, CLUSTER WORKSHOPS, 2023, : 5 - 7
  • [3] High Performance I/O For Large Scale Deep Learning
    Aizman, Alex
    Maltby, Gavin
    Breuel, Thomas
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5965 - 5967
  • [4] Improving the I/O of large geophysical models using PnetCDF and BeeGFS
    Brzenski, Jared
    Paolini, Christopher
    Castillo, Jose E.
    PARALLEL COMPUTING, 2021, 104
  • [5] PHDFS: Optimizing I/O performance of HDFS in deep learning cloud computing platform
    Zhu, Zongwei
    Tan, Luchao
    Li, Yinzhen
    Ji, Cheng
    JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 109
  • [6] I/O performance evaluation with Parabench - programmable I/O benchmark
    Mordvinova, Olga
    Runz, Dennis
    Kunkel, Julian M.
    Ludwig, Thomas
    ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01): : 2119 - 2128
  • [7] PARALLEL I/O OPTIMIZATIONS FOR SCALABLE DEEP LEARNING
    Pumma, Sarunya
    Si, Min
    Feng, Wu-chun
    Balaji, Pavan
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 720 - 729
  • [8] A NEW APPROACH TO I/O PERFORMANCE EVALUATION - SELF-SCALING I/O BENCHMARKS, PREDICTED I/O PERFORMANCE
    CHEN, PM
    PATTERSON, DA
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1994, 12 (04): : 308 - 339
  • [9] Evaluation of hyperspectral data for deep learning model performance
    Butler, Samantha J.
    Price, Stanton R.
    Carley, Samantha S.
    Land, Haley B.
    Price, Steven R.
    ALGORITHMS, TECHNOLOGIES, AND APPLICATIONS FOR MULTISPECTRAL AND HYPERSPECTRAL IMAGING XXX, 2024, 13031
  • [10] Performance Evaluation of Deep Learning Compilers for Edge Inference
    Verma, Gaurav
    Gupta, Yashi
    Malik, Abid M.
    Chapman, Barbara
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 858 - 865