Coding for high availability of a distributed-parallel storage system

被引：12

作者：

Malluhi, QM

Johnston, WE

机构：

[1] Jackson State Univ, Dept Comp Sci, Jackson, MS 39217 USA

[2] Ernesto Orlando Lawrence Berkeley Natl Lab, Informat & Comp Sci Div, Berkeley, CA 94720 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 1998年 / 9卷 / 12期

关键词：

storage systems; availability; scalability; RAID; high performance; distributed systems; error-correcting codes;

D O I：

10.1109/71.737699

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We have developed a distributed parallel storage system that employs the aggregate bandwidth of multiple data servers connected by a high-speed wide-area network to achieve scalability and high data throughput. This paper studies different schemes to enhance the reliability and availability of such network-based distributed storage systems. The general approach of this paper employs "erasure" error-correcting codes that can be used to reconstruct missing information caused by hardware, software, or human faults. The paper describes the approach and develops optimized algorithms for the encoding and decoding operations. Moreover, the paper presents techniques for reducing the communication and computation overhead incurred while reconstructing missing data from the redundant information. These techniques include clustering, multidimensional coding, and the full two-dimensional parity schemes. The paper considers trade-offs between redundancy, fault tolerance, and complexity of error recovery.

引用

页码：1237 / 1252

页数：16

共 50 条

[1] Approaches for a reliable high-performance distributed-parallel storage system
Malluhi, QM
Johnston, WE
PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 1996, : 500 - 509
[2] DisCAS: A distributed-parallel computer algebra system
Wu, YW
Yang, GW
Zheng, WM
Lin, DD
COMPUTATIONAL SCIENCE - ICCS 2004, PROCEEDINGS, 2004, 3039 : 295 - 302
[3] Boundary element cluster computing on distributed-parallel workstations
Kamiya, N
Iwase, H
Kita, E
BOUNDARY ELEMENT TECHNOLOGY XI, 1996, : 407 - 414
[4] RELEVANCE OF NETWORK THEORY TO MODELS OF DISTRIBUTED-PARALLEL PROCESSING
MURATA, T
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 1980, 310 (01): : 41 - 50
[5] Distributed-parallel CFD computation for all fuel assemblies in PWR core
Chen, Guangliang
Wang, Jijun
Zhang, Zhijian
Tian, Zhaofei
Li, Lei
Kang, Huilun
Jin, Yuguan
ANNALS OF NUCLEAR ENERGY, 2020, 141
[6] The Reliability Design and Availability Analysis of a Distributed Storage System
YANG Xiaohui
Wuhan University Journal of Natural Sciences, 2006, (06) : 1919 - 1922
[7] Availability evaluation of a video surveillance system with distributed storage
Borges, Ivson
Andrade, Ermeson
Silva, Francisco Airton
Callou, Gustavo
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (04):
[8] DIVE-C: Distributed-parallel Virtual Environment on Cloud computing platform
Jung, In-Yong
Han, Byong-John
Lee, Hanku
Jeong, Chang-Sung
International Journal of Multimedia and Ubiquitous Engineering, 2013, 8 (05): : 19 - 29
[9] RocketHA: A High Availability Design Paradigm for Distributed Log-Based Storage System
Ji, Juntao
Jin, Rongtong
Fu, Yubao
Gu, Yinyou
Tsai, Tsung-han
Lin, Qingshan
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1819 - 1824
[10] Distributed-Parallel Road Traffic Simulator for Clusters of Multi-core Computers
Potuzak, Tomas
2012 IEEE/ACM 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2012, : 195 - 201

← 1 2 3 4 5 →