Evaluation of distributed recovery in large-scale storage systems

被引：41

作者：

Xin, Q ^{[1
]}

Miller, EL ^{[1
]}

Schwarz, TJE ^{[1
]}

机构：

[1] Univ Calif Santa Cruz, Storage Syst Res Ctr, Santa Cruz, CA 95064 USA

来源：

13TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS | 2004年

关键词：

D O I：

10.1109/HPDC.2004.1323523

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Storage clusters consisting of thousands of disk drives are now being used both for their large capacity and high throughput. However, their reliability is far worse than that of smaller storage systems due to the increased number of storage nodes. RAID technology is no longer sufficient to guarantee the necessary high data reliability for such systems, because disk rebuild nine lengthens as disk capacity grows. In this paper, we present FAst Recovery Mechanism (FARM), a distributed recover), approach that exploits excess disk capacity and reduces data recovery time. FARM works in concert with replication and erasure-coding redundancy schemes to dramatically lower the probability of data loss in large-scale storage systems. We have examined essential factors that influence system reliability, performance, and costs, such as failure detections, disk bandwidth usage for recovery, disk space utilization, disk drive replacement, and system scales, by simulating system behavior under disk failures. Our results show the reliability improvement from FARM and demonstrate the impacts of various factors on system reliability. Using our techniques system designers will be better able to build multi-petabyte storage systems with much higher reliability at lower cost than previously possible.

引用

页码：172 / 181

页数：10

共 50 条

[1] Independent recovery in large-scale distributed systems
Triantafillou, P
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (11) : 812 - 826
[2] Key Management for Large-Scale Distributed Storage Systems
Lim, Hoon Wei
[J]. PUBLIC KEY INFRASTRUCTURES, SERVICES AND APPLICATIONS, 2010, 6391 : 99 - 113
[3] A Data Storage Approach for Large-Scale Distributed Medical Systems
de Macedo, Douglas D. J.
von Wangenheim, Aldo
Dantas, Mario A. R.
[J]. 2015 9TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS CISIS 2015, 2015, : 486 - 490
[4] Approximate Reliability Evaluation of Large-Scale Distributed Systems
Mo, Yuchang
Han, Jianmin
Zhang, Zhizheng
Pan, Zhusheng
Zhong, Farong
[J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (01) : 25 - 41
[5] An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems
Gao, Yu
Dou, Wensheng
Qin, Feng
Gao, Chushu
Wang, Dong
Wei, Jun
Huang, Ruirui
Zhou, Li
Wu, Yongming
[J]. ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 539 - 550
[6] Storage optimization for large-scale distributed stream-processing systems
Hildrum, Kirsten
Douglis, Fred
Wolf, Joel L.
Yu, Philip S.
Fleischer, Lisa
Katta, Akshay
[J]. ACM Transactions on Storage, 2008, 3 (04)
[7] Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation
Lu, Yi
Cheng, James
Yan, Da
Wu, Huanhuan
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 281 - 292
[8] On the Speedup of Recovery in Large-Scale Erasure-Coded Storage Systems
Zhu, Yunfeng
Lee, Patrick P. C.
Xu, Yinlong
Hu, Yuchong
Xiang, Liping
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (07) : 1830 - 1840
[9] Fault Tolerance Performance Evaluation of Large-Scale Distributed Storage Systems HDFS and Ceph Case Study
Arafa, Yehia
Barai, Atanu
Zheng, Mai
Badawy, Abdel-Hameed A.
[J]. 2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
[10] Design and performance evaluation of fail-over and recovery strategies for large-scale multimedia storage systems
Zeng, Zeng
Veeravalli, Bharadwaj
Srivastava, Jaideep
[J]. ICON: 2006 IEEE INTERNATIONAL CONFERENCE ON NETWORKS, VOLS 1 AND 2, PROCEEDINGS: NETWORKING -CHALLENGES AND FRONTIERS, 2006, : 9 - +

← 1 2 3 4 5 →