Exploring Erasure Coding Techniques for High Availability of Intermediate Data

被引:0
|
作者
Zhang, Zhe [1 ]
Bockelman, Brian [2 ]
Weitzel, Derek [1 ]
Swanson, David [1 ]
机构
[1] Univ Nebraska Lincoln, Holland Comp Ctr, Lincoln, NE 68588 USA
[2] Morgridge Inst Res, Madison, WI 53715 USA
关键词
Intermediate data; Erasure code; Data availability; Proactive relocation; Redundancy localization; MTTDL; Network bandwidth; MANAGEMENT;
D O I
10.1109/CCGrid49817.2020.00012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific computing workflows generate enormous distributed data that is short-lived, yet critical for job completion time. This class of data is called intermediate data. A common way to achieve high data availability is to replicate data. However, an increasing scale of intermediate data generated in modern scientific applications demands new storage techniques to improve storage efficiency. Erasure Codes, as an alternative, can use less storage space while maintaining similar data availability. In this paper, we adopt erasure codes for storing intermediate data and compare its performance with replication. We also use the metric of Mean-Time-To-Data-Loss (MTTDL) to estimate the lifetime of intermediate data. We propose an algorithm to proactively relocate data redundancy from vulnerable machines to reliable ones to improve data availability with some extra network overhead. Furthermore, we propose an algorithm to assign redundancy units of data physically close to each other on the network to reduce the network bandwidth for reconstructing data when it is being accessed.
引用
收藏
页码:865 / 872
页数:8
相关论文
共 50 条
  • [1] High availability in DHTs: Erasure coding vs. replication
    Rodrigues, R
    Liskov, B
    PEER-TO-PEER SYSTEMS IV, 2005, 3640 : 226 - 239
  • [2] Fast Erasure Coding for Data Storage: A Comprehensive Study of the Acceleration Techniques
    Zhou, Tianli
    Tian, Chao
    ACM TRANSACTIONS ON STORAGE, 2020, 16 (01)
  • [3] Fast Erasure Coding for Data Storage: A Comprehensive Study of the Acceleration Techniques
    Zhou, Tianli
    Tian, Chao
    PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2019, : 317 - 329
  • [4] Parallel Erasure Coding: Exploring Task Parallelism in Erasure Coding for Enhanced Bandwidth and Energy Efficiency
    Chen, Hsing-Hung
    Fu, Song
    2016 IEEE INTERNATIONAL CONFERENCE ON NETWORKING ARCHITECTURE AND STORAGE (NAS), 2016,
  • [5] On Data Parallelism of Erasure Coding in Distributed Storage Systems
    Li, Jun
    Li, Baochun
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 45 - 56
  • [6] Collaborative Data Collection with Opportunistic Network Erasure Coding
    Xu, Mingsen
    Song, Wen-Zhan
    Zhao, Yichuan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (10) : 1941 - 1950
  • [7] Efficient Heuristic Replication Techniques for High Data Availability in Cloud
    Chandrakala H.L.
    Loganathan R.
    Computer Systems Science and Engineering, 2023, 45 (03): : 3151 - 3164
  • [8] ERASURE TECHNIQUES FOR HIGH-DENSITY RECORDING
    CHRISTENSEN, ER
    FINKELSTEIN, BI
    IEEE TRANSACTIONS ON MAGNETICS, 1985, 21 (05) : 1377 - 1379
  • [9] Sparsity exploiting erasure coding for distributed storage of versioned data
    Harshan, J.
    Oggier, Frederique
    Datta, Anwitaman
    COMPUTING, 2016, 98 (12) : 1305 - 1329
  • [10] Is it time to revisit Erasure Coding in Data-intensive clusters?
    Darrous, Jad
    Ibrahim, Shadi
    Perez, Christian
    2019 IEEE 27TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2019), 2019, : 165 - 178