Dynamic erasure coding decision for modern block-oriented distributed storage systems

被引:3
|
作者
Ahn, Hoo-Young [1 ]
Lee, Kyong-Ha [2 ]
Lee, Yoon-Joon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, 291 Daehak Ro, Taejon 305701, South Korea
[2] KISTI, Sci Data Res Ctr, 245 Daehak Ro, Daejeon 305806, South Korea
来源
JOURNAL OF SUPERCOMPUTING | 2016年 / 72卷 / 04期
关键词
Distributed storage system; Storage overhead; Hadoop; HDFS; Data replication; Erasure coding; RAID;
D O I
10.1007/s11227-016-1661-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern block-oriented distributed storage systems like Hadoop distributed file system have proliferated in this era of big data and cloud computing. These systems feature block-level replication in which their files are partitioned into equal-sized blocks and multiple copies for each block are then arbitrarily distributed across nodes for fault tolerance and data availability. However, many storage volumes are just wasted only for keeping block copies whose data may not be accessed frequently in the strategy. Therefore, distributed storage systems begin to adopt erasure codes. However, classical parity encoding scheme are hard to be directly applied to the distributed storage systems since block copies are arbitrarily placed across nodes in the systems. We present a novel technique, called DynaEC, to address the issues in modern block-oriented distributed storage systems. DynaEC provides a unique parity encoding algorithm that encodes data blocks arbitrarily distributed across machines to parities and then places the parities guaranteeing fault tolerance. Parity encoding in DynaEC is performed without any change of the original block placement policy in Hadoop distributed file system. This makes DynaEC work seamlessly with Hadoop distributed file system. Finally, during the encoding procedure each data node encodes each own data blocks, not requiring any information about other blocks located in other data nodes. As such, the encoding procedure in DynaEC is fully performed in parallel without any synchronization issue. With extensive experiments, we show that DynaEC saves storage volumes up to the theoretical limit while outperforming previous approaches by multiple orders of magnitude.
引用
收藏
页码:1312 / 1341
页数:30
相关论文
共 50 条
  • [21] Identification of Block-oriented Systems with Rate Saturation Nonlinearity
    Yong, Alex Y. K.
    Tan, Ai Hui
    Cham, Chin Leei
    IFAC PAPERSONLINE, 2015, 48 (28): : 939 - 944
  • [22] On estimation of approximate inverse models of block-oriented systems
    Jung, Ylva
    Enqvist, Martin
    IFAC PAPERSONLINE, 2015, 48 (28): : 1226 - 1231
  • [23] Efficient Scheduling for Multi-Block Updates in Erasure Coding Based Storage Systems
    Shen, Jiajie
    Zhang, Kai
    Gu, Jiazhen
    Zhou, Yangfan
    Wang, Xin
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (04) : 573 - 581
  • [24] Have a Seat on the ErasureBench: Easy Evaluation of Erasure Coding Libraries for Distributed Storage Systems
    Vaucher, Sebastien
    Mercier, Hugues
    Schiavoni, Valerio
    2016 IEEE 35TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS WORKSHOPS (SRDSW), 2016, : 55 - 60
  • [25] Erasure Coding for Cloud Storage Systems: A Survey
    Li, Jun
    Li, Baochun
    TSINGHUA SCIENCE AND TECHNOLOGY, 2013, 18 (03) : 259 - 272
  • [26] Erasure Coding for Cloud Storage Systems: A Survey
    Jun Li
    Baochun Li
    Tsinghua Science and Technology, 2013, 18 (03) : 259 - 272
  • [27] Block-oriented image decomposition and retrieval in image database systems
    Remias, E
    Sheikholeslami, G
    Zhang, AD
    INTERNATIONAL WORKSHOP ON MULTI-MEDIA DATABASE MANAGEMENT SYSTEMS, PROCEEDINGS, 1996, : 85 - 92
  • [28] Recursive identification of block-oriented nonlinear systems in the presence of outliers
    Filipovic, Vojislav
    JOURNAL OF PROCESS CONTROL, 2019, 78 : 1 - 12
  • [29] Parameter Identification of Block-Oriented Nonlinear Systems in the Frequency Domain
    Shanshiashvili, B.
    Rigishvili, T.
    IFAC PAPERSONLINE, 2020, 53 (02): : 10695 - 10700
  • [30] Identification of block-oriented nonlinear systems using orthonormal bases
    Gómez, JC
    Baeyens, E
    JOURNAL OF PROCESS CONTROL, 2004, 14 (06) : 685 - 697