I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning

被引:50
|
作者
Chowdhury, Fahim [1 ]
Zhu, Yue [1 ]
Heer, Todd [2 ]
Paredes, Saul [1 ]
Moody, Adam [2 ]
Goldstone, Robin [2 ]
Mohror, Kathryn [2 ]
Yu, Weikuan [1 ]
机构
[1] Florida State Univ, Tallahassee, FL 32306 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/3337821.3337902
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel File Systems (PFSs) are frequently deployed on leadership High Performance Computing (HPC) systems to ensure efficient I/O, persistent storage and scalable performance. Emerging Deep Learning (DL) applications incur new I/O and storage requirements to HPC systems with batched input of small random files. This mandates PFSs to have commensurate features that can meet the needs of DL applications. BeeGFS is a recently emerging PFS that has grabbed the attention of the research and industry world because of its performance, scalability and ease of use. While emphasizing a systematic performance analysis of BeeGFS, in this paper, we present the architectural and system features of BeeGFS, and perform an experimental evaluation using cutting-edge I/O, Metadata and DL application benchmarks. Particularly, we have utilized AlexNet and ResNet-50 models for the classification of ImageNet dataset using the Livermore Big Artificial Neural Network Toolkit (LBANN), and ImageNet data reader pipeline atop TensorFlow and Horovod. Through extensive performance characterization of BeeGFS, our study provides a useful documentation on how to leverage BeeGFS for the emerging DL applications.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] I/O characterization and performance evaluation of large-scale storage architectures for heterogeneous workloads
    Kogiou, Olga
    Devarajan, Hariharan
    Wang, Chen
    Yu, Weikuan
    Mohror, Kathryn
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING WORKSHOPS, CLUSTER WORKSHOPS, 2023, : 44 - 45
  • [22] Informed Prefetching in I/O Bounded Distributed Deep Learning
    Ruan, Xiaojun
    Chen, Haiquan
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 850 - 857
  • [23] Performance characterization of irregular I/O at the extreme scale
    Herbein, S.
    McDaniel, S.
    Podhorszki, N.
    Logan, J.
    Klasky, S.
    Taufer, M.
    PARALLEL COMPUTING, 2016, 51 : 17 - 36
  • [24] Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks
    Shafique, Muhammad Ali
    Munir, Arslan
    Kong, Joonho
    AI, 2023, 4 (04) : 926 - 948
  • [25] Performance evaluation of exception handling in I/O libraries
    DeVale, J
    Koopman, P
    INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2001, : 519 - 524
  • [26] Performance Evaluation of Deep Learning Algorithms in Biomedical Document Classification
    Behera, Bichitrananda
    Kumaravelan, G.
    Kumar, Prem B.
    2019 11TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC 2019), 2019, : 220 - 224
  • [27] Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 949 - 957
  • [28] Performance Evaluation of Deep Learning Classification Network for Image Features
    Li, Qiang
    Yang, Yingjian
    Guo, Yingwei
    Li, Wei
    Liu, Yang
    Liu, Han
    Kang, Yan
    IEEE ACCESS, 2021, 9 : 9318 - 9333
  • [29] Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques
    Hameed R.A.
    Abed W.J.
    Sadiq A.T.
    International Journal of Interactive Mobile Technologies, 2023, 17 (09) : 70 - 87
  • [30] Collective Communication Performance Evaluation for Distributed Deep Learning Training
    Lee, Sookwang
    Lee, Jaehwan
    APPLIED SCIENCES-BASEL, 2024, 14 (12):