Coding for high availability of a distributed-parallel storage system

被引:12
|
作者
Malluhi, QM
Johnston, WE
机构
[1] Jackson State Univ, Dept Comp Sci, Jackson, MS 39217 USA
[2] Ernesto Orlando Lawrence Berkeley Natl Lab, Informat & Comp Sci Div, Berkeley, CA 94720 USA
关键词
storage systems; availability; scalability; RAID; high performance; distributed systems; error-correcting codes;
D O I
10.1109/71.737699
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We have developed a distributed parallel storage system that employs the aggregate bandwidth of multiple data servers connected by a high-speed wide-area network to achieve scalability and high data throughput. This paper studies different schemes to enhance the reliability and availability of such network-based distributed storage systems. The general approach of this paper employs "erasure" error-correcting codes that can be used to reconstruct missing information caused by hardware, software, or human faults. The paper describes the approach and develops optimized algorithms for the encoding and decoding operations. Moreover, the paper presents techniques for reducing the communication and computation overhead incurred while reconstructing missing data from the redundant information. These techniques include clustering, multidimensional coding, and the full two-dimensional parity schemes. The paper considers trade-offs between redundancy, fault tolerance, and complexity of error recovery.
引用
收藏
页码:1237 / 1252
页数:16
相关论文
共 50 条
  • [21] A high-availability software update method for distributed storage systems
    Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Tokyo, 152-8552, Japan
    不详
    不详
    Syst Comput Jpn, 2006, 10 (35-46):
  • [22] Efficient object storage journaling in a distributed parallel file system
    National Center for Computational Sciences, Oak Ridge National Laboratory, United States
    不详
    Proc. FAST : USENIX Conf. File Storage Technol., (143-154):
  • [23] AVAILABILITY OF A PARALLEL SYSTEM
    RAMANARAYANAN, R
    USHA, K
    IEEE TRANSACTIONS ON RELIABILITY, 1980, 29 (03) : 281 - 281
  • [24] Scaling Up Set Similarity Joins Using a Cost-Based Distributed-Parallel Framework
    Fier, Fabian
    Freytag, Johann-Christoph
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2021, 2021, 13058 : 17 - 31
  • [25] The Research and Design for High Availability Object Storage System
    Zhan, Ling
    Tan, Zhihu
    Gu, Peng
    Wan, Jiguang
    EIGHTH INTERNATIONAL SYMPOSIUM ON OPTICAL STORAGE AND 2008 INTERNATIONAL WORKSHOP ON INFORMATION DATA STORAGE, 2009, 7125
  • [26] High availability replication strategy for deduplication storage system
    Zhou, Zhengda
    Zhou, Jingli
    Advances in Information Sciences and Service Sciences, 2012, 4 (08): : 115 - 123
  • [27] Data Recovery Approach with Optimized Cauchy Coding in Distributed Storage System
    Funde, Snehalata
    Swain, Gandharba
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 620 - 629
  • [28] Erasure coding for distributed storage: an overview
    S.B.BALAJI
    M.Nikhil KRISHNAN
    Myna VAJHA
    Vinayak RAMKUMAR
    Birenjith SASIDHARAN
    P.Vijay KUMAR
    Science China(Information Sciences), 2018, 61 (10) : 7 - 51
  • [29] Graftage Coding for Distributed Storage Systems
    Rui, Jiayi
    Huang, Qin
    Wang, Zulin
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (04) : 2192 - 2205
  • [30] Special focus on distributed storage coding
    Tang, Xiaohu
    Xia, Shu-Tao
    Tian, Chao
    Huang, Qin
    Xia, Xiang-Gen
    SCIENCE CHINA-INFORMATION SCIENCES, 2018, 61 (10)