Reliability of Centralized vs. Parallel Software Models for Composable Storage Systems

被引:1
|
作者
Blaum, Mario [1 ]
Muench, Paul [1 ]
机构
[1] IBM Res Div Almaden, San Jose, CA 95120 USA
关键词
Hyperconverged architectures; hyper-converged infrastructure (HCI); cloud applications; DIMM failure rate; metadata server; composable systems;
D O I
10.1109/QRS54544.2021.00064
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modern storage systems consist of many hardware and software components. The core of these systems are server drawers containing data, where at least one of such drawers consists of parity (a special case is two mirrored drawers). We analyze the failure rate of two such systems both based on hyperconverged architectures: one centralized, in which the drawers share the metadata server, and one parallel, in which each drawer has its own metadata server. Inherently the parallel systems will have greater reliability. However, the new CXL and Gen-Z architectures are enabling a centralized approach where resources from multiple servers are combined to make a single virtual server. In this paper we analyze what techniques can make the probability of failure of the centralized approach approximate the probability of failure of the parallel approach. We identified the probability of Dual In-Line Memory Modules (DIMMs) failure as the key differentiator between the probability of failure of the centralized and parallel systems, and we suggest methods to compensate for DIMMs with high probability of failure.
引用
下载
收藏
页码:534 / 542
页数:9
相关论文
共 50 条
  • [41] Models for Estimating the Execution Time of Software Loops in Parallel and Distributed Systems
    Wrobel, Magdalena
    THEORY AND ENGINEERING OF COMPLEX SYSTEMS AND DEPENDABILITY, 2015, 365 : 533 - 542
  • [42] Suitability analysis of software reliability models for its applicability on NPP systems
    Kumar, Pramod
    Singh, Lalit Kumar
    Kumar, Chiranjeev
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2018, 34 (08) : 1491 - 1509
  • [43] Reliability properties of series and parallel systems from bivariate exponential models
    Franco, M
    Vivo, JM
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2002, 31 (12) : 2349 - 2360
  • [44] Reliability Models for Almost-Series and Almost-Parallel Systems
    Graves, Todd L.
    Anderson-Cook, Christine M.
    Hamada, Michael S.
    TECHNOMETRICS, 2010, 52 (02) : 160 - 171
  • [46] Redundancy vs. Protection in Defending Parallel Systems Against Unintentional and Intentional Impacts
    Levitin, Gregory
    Hausken, Kjell
    IEEE TRANSACTIONS ON RELIABILITY, 2009, 58 (04) : 679 - 690
  • [47] Improving Reliability of Energy-Efficient Parallel Storage Systems by Disk Swapping
    Yin, Shu
    Ruan, Xiaojun
    Manzanares, Adam
    Ding, Zhiyang
    Xie, Jiong
    Majors, James
    Qin, Xiao
    2009 IEEE 28TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCC 2009), 2009, : 87 - 94
  • [48] Word Length Effects and the Serial vs. Parallel Debate in Connectionist Models of Reading Aloud
    Kawamoto, Alan H.
    COGNITION IN FLUX, 2010, : 673 - 673
  • [49] Overview of Data Mining Classification Techniques: Traditional vs. Parallel/Distributed Programming Models
    Besimi, Nuhi
    Cico, Betim
    Besimi, Adrian
    2017 6TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2017, : 433 - 436
  • [50] Extrapolative models of dynamics systems: Neural networks vs. Kalman Filter
    Muawin, A
    Chowdhury, FN
    PROCEEDINGS OF THE TWENTY-NINTH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY, 1997, : 315 - 319