A comprehensive repair scheme for distributed storage systems

被引:1
|
作者
Chen, Junmei [1 ]
Li, Zongpeng [1 ,2 ]
Fang, Guang [1 ]
Hou, Yeqiao [1 ]
Li, Xianglong [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
关键词
Distributed storage system; Erasure code; Data reliability; Heterogeneous; Cross-rack; Access skew; CODES;
D O I
10.1016/j.comnet.2023.109954
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern data storage systems apply erasure codes to provide data reliability efficiently. Previous studies proposed a series of techniques to weigh repair/storage costs, reduce codec complexity, minimize repair time, improve fault tolerance, and enforce system-level service level agreement. These techniques have been designed in isolation, leading to performance limitations. We explore the potential advantages of combining these techniques to meet data storage systems' requirements better and provide superior system performance. This work proposes a comprehensive repair scheme for fault data in distributed storage systems. First, we tailor design erasure codes in the presence of heterogeneity of storage devices. The core idea is to monitor device performance (e.g., access speed, reliability), compute two coefficients for each device, and use them to select the appropriate devices to create stripes of erasure codes. Second, we leverage the system hierarchy to perform intermediary repair operations, further minimizing cross-rack repair bandwidth. Finally, we propose a new repair scheme adapted to the skew of data access. To demonstrate the effectiveness of our comprehensive repair scheme, we evaluate various erasure codes via mathematical analysis and experiments in the Ceph cluster. In the mise-en-scene of traditional re-encoding methods and more recent adaptive erasure codes, our scheme stands out with significant savings in recovery bandwidth, code-switching bandwidth, repair time, and code-switching time.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Broadcast Repair for Wireless Distributed Storage Systems
    Hu, Ping
    Sung, Chi Wan
    Chan, Terence H.
    2015 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING (ICICS), 2015,
  • [2] An Optimal Tree-Structured Repair Scheme of Multiple Failure Nodes for Distributed Storage Systems
    Zhou, Anan
    Yi, Benshun
    Liu, Yusheng
    Luo, Laigan
    IEEE ACCESS, 2021, 9 : 21843 - 21858
  • [3] A Rack-Aware Pipeline Repair Scheme for Erasure-Coded Distributed Storage Systems
    Liu, Tong
    Alibhai, Shakeel
    He, Xubin
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [4] Repair Topology Design for Distributed Storage Systems
    Yu, Quan
    Sung, Chi Wan
    Chan, Terence H.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2012,
  • [5] Repair for Distributed Storage Systems with Erasure Channels
    Gerami, Majid
    Xiao, Ming
    2013 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2013, : 4058 - 4062
  • [6] On Secure Distributed Storage Systems with Exact Repair
    Tandon, Ravi
    Amuru, SaiDhiraj
    Clancy, T. Charles
    Buehrer, R. Michael
    2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 3908 - 3912
  • [7] Distributed Data Storage Systems with Opportunistic Repair
    Aggarwal, Vaneet
    Tian, Chao
    Vaishampayan, Vinay A.
    Chen, Yih-Farn R.
    2014 PROCEEDINGS IEEE INFOCOM, 2014, : 1833 - 1841
  • [8] Storage-Repair Tradeoff for Hierarchical Distributed Storage Systems
    Yu, Quan
    Zeng, Xinyi
    Liao, Yangzhe
    Ai, Qingsong
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1650 - 1654
  • [9] Minimization of Storage Cost in Distributed Storage Systems with Repair Consideration
    Yu, Quan
    Shum, Kenneth W.
    Sung, Chi Wan
    2011 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE (GLOBECOM 2011), 2011,
  • [10] A Bandwidth-Efficient Scheme for Distributed Storage Systems
    Bathaee, Najmeh Sadat
    Pakravan, Mohammad Reza
    ANTS: 2008 2ND INTERNATIONAL SYMPOSIUM ON ADVANCED NETWORKS AND TELECOMMUNICATION SYSTEMS, 2008, : 97 - 99