Dynamic erasure coding decision for modern block-oriented distributed storage systems

被引:3
|
作者
Ahn, Hoo-Young [1 ]
Lee, Kyong-Ha [2 ]
Lee, Yoon-Joon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, 291 Daehak Ro, Taejon 305701, South Korea
[2] KISTI, Sci Data Res Ctr, 245 Daehak Ro, Daejeon 305806, South Korea
来源
JOURNAL OF SUPERCOMPUTING | 2016年 / 72卷 / 04期
关键词
Distributed storage system; Storage overhead; Hadoop; HDFS; Data replication; Erasure coding; RAID;
D O I
10.1007/s11227-016-1661-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern block-oriented distributed storage systems like Hadoop distributed file system have proliferated in this era of big data and cloud computing. These systems feature block-level replication in which their files are partitioned into equal-sized blocks and multiple copies for each block are then arbitrarily distributed across nodes for fault tolerance and data availability. However, many storage volumes are just wasted only for keeping block copies whose data may not be accessed frequently in the strategy. Therefore, distributed storage systems begin to adopt erasure codes. However, classical parity encoding scheme are hard to be directly applied to the distributed storage systems since block copies are arbitrarily placed across nodes in the systems. We present a novel technique, called DynaEC, to address the issues in modern block-oriented distributed storage systems. DynaEC provides a unique parity encoding algorithm that encodes data blocks arbitrarily distributed across machines to parities and then places the parities guaranteeing fault tolerance. Parity encoding in DynaEC is performed without any change of the original block placement policy in Hadoop distributed file system. This makes DynaEC work seamlessly with Hadoop distributed file system. Finally, during the encoding procedure each data node encodes each own data blocks, not requiring any information about other blocks located in other data nodes. As such, the encoding procedure in DynaEC is fully performed in parallel without any synchronization issue. With extensive experiments, we show that DynaEC saves storage volumes up to the theoretical limit while outperforming previous approaches by multiple orders of magnitude.
引用
收藏
页码:1312 / 1341
页数:30
相关论文
共 50 条
  • [1] Dynamic erasure coding decision for modern block-oriented distributed storage systems
    Hoo-Young Ahn
    Kyong-Ha Lee
    Yoon-Joon Lee
    The Journal of Supercomputing, 2016, 72 : 1312 - 1341
  • [2] On Data Parallelism of Erasure Coding in Distributed Storage Systems
    Li, Jun
    Li, Baochun
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 45 - 56
  • [3] In-network block repairing for erasure coding storage systems
    Xia, Junxu
    Guo, Deke
    Cheng, Geyao
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (24):
  • [4] Demand-Aware Erasure Coding for Distributed Storage Systems
    Li, Jun
    Li, Baochun
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2021, 9 (02) : 532 - 545
  • [5] Erasure coding for distributed storage: an overview
    Balaji, S. B.
    Krishnan, M. Nikhil
    Vajha, Myna
    Ramkumar, Vinayak
    Sasidharan, Birenjith
    Kumar, P. Vijay
    SCIENCE CHINA-INFORMATION SCIENCES, 2018, 61 (10)
  • [6] Erasure coding for distributed storage: an overview
    S.B.BALAJI
    M.Nikhil KRISHNAN
    Myna VAJHA
    Vinayak RAMKUMAR
    Birenjith SASIDHARAN
    P.Vijay KUMAR
    Science China(Information Sciences), 2018, 61 (10) : 7 - 51
  • [7] Erasure coding for distributed storage: an overview
    S. B. Balaji
    M. Nikhil Krishnan
    Myna Vajha
    Vinayak Ramkumar
    Birenjith Sasidharan
    P. Vijay Kumar
    Science China Information Sciences, 2018, 61
  • [8] Storage vs Repair Bandwidth for Network Erasure Coding in Distributed Storage Systems
    Singal, Swati Mittal
    Rakesh, Nitin
    Matam, Rakesh
    2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND IMPLEMENTATIONS (ICSCTI), 2015,
  • [9] Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems
    Kim, Jeong-Joon
    APPLIED SCIENCES-BASEL, 2021, 11 (08):
  • [10] Efficiently Coding Replicas to Erasure Coded Blocks in Distributed Storage Systems
    Yuan, Zimu
    Liu, Huiying
    IEEE COMMUNICATIONS LETTERS, 2017, 21 (09) : 1897 - 1900