Network Aware Reliability Analysis for Distributed Storage Systems

被引:0
|
作者
Epstein, Amir [1 ]
Kolodner, Elliot K. [1 ]
Sotnikov, Dmitry [1 ]
机构
[1] IBM Res Haifa, Haifa, Israel
关键词
CODES;
D O I
10.1109/SRDS.2016.40
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
It is hard to measure the reliability of a large distributed storage system, since it is influenced by low probability failure events that occur over time. Nevertheless, it is critical to be able to predict reliability in order to plan, deploy and operate the system. Existing approaches suffer from unrealistic assumptions regarding network bandwidth. This paper introduces a new framework that combines simulation and an analytic model to estimate durability for large distributed cloud storage systems. Our approach is the first that takes into account network bandwidth with a focus on the cumulative effect of simultaneous failures on repair time. Using our framework we evaluate the trade-offs between durability, network and storage costs for the OpenStack Swift object store, comparing various system configurations and resiliency schemes, including replication and erasure coding. In particular, we show that when accounting for the cumulative effect of simultaneous failures, the probability of data loss estimates can vary by two to four orders of magnitude.
引用
收藏
页码:249 / 258
页数:10
相关论文
共 50 条
  • [1] Analysis of Data Reliability Tradeoffs in Hybrid Distributed Storage Systems
    Tang, Bing
    Fedak, Gilles
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 1546 - 1555
  • [2] Reliability Analysis of Highly Redundant Distributed Storage Systems with Dynamic Refuging
    Akutsu, Hiroaki
    Ueda, Kazunori
    Chiba, Takeru
    Kawaguchi, Tomohiro
    Shimozono, Norio
    [J]. 23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 261 - 268
  • [3] Reliability and Failure Impact Analysis of Distributed Storage Systems with Dynamic Refuging
    Akutsu, Hiroaki
    Ueda, Kazunori
    Chiba, Takeru
    Kawaguchi, Tomohiro
    Shimozono, Norio
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (09): : 2259 - 2268
  • [4] Reliability analysis of distributed storage systems considering data loss and theft
    Jia, Heping
    Peng, Rui
    Ding, Yi
    Shao, Changzheng
    [J]. PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART O-JOURNAL OF RISK AND RELIABILITY, 2020, 234 (02) : 303 - 321
  • [5] Network coding for distributed storage systems
    Dimakis, Alexandros G.
    Godfrey, P. Brighten
    Wainwright, Martin J.
    Ramchandran, Kannan
    [J]. INFOCOM 2007, VOLS 1-5, 2007, : 2000 - +
  • [6] Network Coding for Distributed Storage Systems
    Dimakis, Alexandros G.
    Godfrey, P. Brighten
    Wu, Yunnan
    Wainwright, Martin J.
    Ramchandran, Kannan
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (09) : 4539 - 4551
  • [7] Analysis on the Reliability of Non-repairable and Repairable Network Storage Systems
    Yin, MingYong
    Wu, Chun
    Tao, Yizheng
    [J]. APPLICATIONS AND TECHNIQUES IN INFORMATION SECURITY, ATIS 2014, 2014, 490 : 147 - 158
  • [8] Dynamically Quantifying and Improving the Reliability of Distributed Storage Systems
    Bachwani, Rekha
    Gryz, Leszek
    Bianchini, Ricardo
    Dubnicki, Cezary
    [J]. PROCEEDINGS OF THE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, 2008, : 85 - +
  • [9] RADPA: Reliability-aware Data Placement Algorithm for large-scale network storage systems
    Chen, Tao
    Liu, Fang
    Xiao, Nong
    [J]. HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 648 - 653
  • [10] Reliability-Aware Energy Management for Hybrid Storage Systems
    Felter, Wes
    Hylick, Anthony
    Carter, John
    [J]. 2011 IEEE 27TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2011,