Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication

被引:1
|
作者
Tan, Nigel [1 ]
Luettgau, Jakob [1 ]
Marquez, Jack [1 ]
Terianishi, Keita [2 ]
Morales, Nicolas [3 ]
Bhowmick, Sanjukta [4 ]
Cappello, Franck [5 ]
Taufer, Michela [1 ]
Nicolae, Bogdan [5 ]
机构
[1] Univ Tennessee Knoxville, Knoxville, TN 37996 USA
[2] Oak Ridge Natl Lab, Oak Ridge, TN USA
[3] Sandia Natl Labs, POB 5800, Albuquerque, NM 87185 USA
[4] Univ North Texas, Denton, TX USA
[5] Argonne Natl Lab, Lemont, IL USA
基金
美国国家科学基金会;
关键词
Checkpointing; data versioning; incremental storage; deduplication; GPU parallelization;
D O I
10.1145/3605573.3605639
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Writing large amounts of data concurrently to stable storage is a typical I/O pattern of many HPCworkflows. This pattern introduces high I/O overheads and results in increased storage space utilization especially for workflows that need to capture the evolution of data structures with high frequency as checkpoints. In this context, many applications, such as graph pattern matching, perform sparse updates to large data structures between checkpoints. For these applications, incremental checkpointing techniques that save only the differences from one checkpoint to another can dramatically reduce the checkpoint sizes, I/O bottlenecks, and storage space utilization. However, such techniques are not without challenges: it is non-trivial to transparently determine what data has changed since a previous checkpoint and assemble the differences in a compact fashion that does not result in excessive metadata. State-of-art data reduction techniques (e.g., compression and de-duplication) have significant limitations when applied to modern HPC applications that leverage GPUs: slow at detecting the differences, generate a large amount of metadata to keep track of the differences, and ignore crucial spatiotemporal checkpoint data redundancy. This paper addresses these challenges by proposing a Merkle tree-based incremental checkpointing method to exploit GPUs' high memory bandwidth and massive parallelism. Experimental results at scale show a significant reduction of the I/O overhead and space utilization of checkpointing compared with state-of-the-art incremental checkpointing and compression techniques.
引用
收藏
页码:665 / 674
页数:10
相关论文
共 50 条
  • [41] An efficient technique for cloud storage using secured de-duplication algorithm
    Mohan, Prakash
    Sundaram, Manikandan
    Satpathy, Sambit
    Das, Sanchali
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 2969 - 2980
  • [42] Efficient urban flood simulation using a GPU-accelerated SPH model
    Qiuhua Liang
    Xilin Xia
    Jingming Hou
    Environmental Earth Sciences, 2015, 74 : 7285 - 7294
  • [43] Optimisation of Water Management Systems Using a GPU-Accelerated Differential Evolution
    Jaros, Jiri
    Marek, Jan
    Mensik, Pavel
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 1727 - 1734
  • [44] Efficient urban flood simulation using a GPU-accelerated SPH model
    Liang, Qiuhua
    Xia, Xilin
    Hou, Jingming
    ENVIRONMENTAL EARTH SCIENCES, 2015, 74 (11) : 7285 - 7294
  • [45] GPU-accelerated Localization in Confined Spaces using Deep Geometric Features
    Brogaard, Rune Y.
    Ravn, Ole
    Boukas, Evangelos
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST), 2021,
  • [46] A GPU-Accelerated Radiation Transfer Model Using the Lattice Boltzmann Method
    Wang, Yansen
    Zeng, Xiping
    Decker, Jonathan
    ATMOSPHERE, 2021, 12 (10)
  • [47] GPU-Accelerated Feature Point Matching Using Extended ColourFAST Descriptors
    Da Fonseca, Eleanor
    Ensor, Andrew
    Hall, Seth
    2015 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2015,
  • [48] GPU-accelerated dislocation dynamics using subcycling time-integration
    Bertin, N.
    Aubry, S.
    Arsenlis, A.
    Cai, W.
    MODELLING AND SIMULATION IN MATERIALS SCIENCE AND ENGINEERING, 2019, 27 (07)
  • [49] Rapid computation of sodium bioscales using gpu-accelerated image reconstruction
    Atkinson, Ian C.
    Liu, Geng
    Obeid, Nady
    Thulborn, Keith R.
    Hwu, Wen-mei
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2013, 23 (01) : 29 - 35
  • [50] GPU-accelerated Human Detection using Fast Directional Chamfer Matching
    Schreiber, David
    Beleznai, Csaba
    Rauter, Michael
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, : 614 - 621