Algorithms for optimal replica placement under correlated failure in hierarchical failure domains

被引:0
|
作者
Mills, K. Alex [1 ]
Chandrasekaran, R. [1 ]
Mittal, Neeraj [1 ]
机构
[1] Univ Texas Dallas, 800 W Campbell Rd, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Replica placement; Correlated failure; Combinatorial optimization; Fault-tolerant storage; Data center management; RELIABILITY;
D O I
10.1016/j.tcs.2020.01.004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In data centers, data replication is the primary method used to ensure availability of customer data. To avoid correlated failure, cloud storage infrastructure providers model hierarchical failure domains using a tree, and avoid placing a large number of data replicas within the same failure domain (i.e. on the same branch of the tree). Typical best practices ensure that replicas are distributed across failure domains, but relatively little is known concerning optimization algorithms for distributing data replicas. Using a hierarchical model, we answer how to distribute replicas across failure domains optimally. We formulate a novel optimization problem for replica placement in data centers. As part of our problem, we formalize and present a new criterion for optimizing a replica placement. Our overall goal is to choose placements in which correlated failures disable as few replicas as possible. In this work, we provide two optimization algorithms for dependency models represented by trees. We first present an O(n + rho log rho) time dynamic programming algorithm for optimally placing rho replicas of a single block on the leaves (representing servers) of a tree with n vertices. We next consider the problem of optimally placing replicas of multiple blocks of data, where every block may have a different replication factor. For this problem, we give a dynamic programming algorithm that runs in O(n rho(3)(max)delta(2)m(poly(delta))), where m denotes the number of blocks, rho(max) denotes the maximum replication factor of a block, and delta denotes the maximum difference in the replication factors of any two blocks. The running time of the algorithm is polynomial when the delta, which we refer to as the skew, is a constant. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:482 / 518
页数:37
相关论文
共 50 条
  • [1] On Replica Placement in High-Availability Storage Under Correlated Failure
    Mills, K. Alex
    Chandrasekaran, R.
    Mittal, Neeraj
    [J]. COMBINATORIAL OPTIMIZATION AND APPLICATIONS, (COCOA 2015), 2015, 9486 : 348 - 363
  • [2] Distributed replica placement algorithms for correlated data
    Tu, Manghui
    Yen, I-Ling
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 68 (01): : 245 - 273
  • [3] Distributed replica placement algorithms for correlated data
    Manghui Tu
    I-Ling Yen
    [J]. The Journal of Supercomputing, 2014, 68 : 245 - 273
  • [4] Optimal replica placement strategy for hierarchical Data Grid systems
    Liu, Pangfeng
    Wu, Jan-Jan
    [J]. SIXTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID: SPANNING THE WORLD AND BEYOND, 2006, : 417 - +
  • [5] Optimal replica placement in hierarchical Data Grids with locality assurance
    Wu, Jan-Jan
    Lin, Yi-Fang
    Liu, Pangfeng
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (12) : 1517 - 1538
  • [6] Optimal algorithms and approximation algorithms for replica placement with distance constraints in tree networks
    Benoit, A.
    Larcheveque, H.
    Renaud-Goud, P.
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 1022 - 1033
  • [7] Optimal Placement of In-Memory Checkpoints Under Heterogeneous Failure Likelihoods
    Hussain, Zaeem
    Znati, Taieb
    Melhem, Rami
    [J]. 2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, : 900 - 910
  • [8] Optimal replica placement under TTL-based consistency
    Tang, Xueyan
    Chi, Huicheng
    Chanson, Samuel T.
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (03) : 351 - 363
  • [9] Fast optimal video placement algorithms for hierarchical Video-on-Demand systems
    Hwang, RH
    Chi, PH
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2001, 47 (04) : 357 - 366
  • [10] Optimal Checkpointing of Fault Tolerant Systems Subject to Correlated Failure
    Jafary, Bentolhoda
    Fiondella, Lance
    [J]. 2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2017,