Data Distribution for Heterogeneous Storage Systems

被引:4
|
作者
Zhou, Jiang [1 ]
Chen, Yong [2 ]
Zheng, Mai [3 ]
Wang, Weiping [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100864, Peoples R China
[2] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[3] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
Performance evaluation; Clustering algorithms; Nonvolatile memory; Costs; Throughput; Hash functions; Servers; Parallel/distributed file systems; data distribution; data placement; heterogeneous storage; data replication; PARALLEL FILE-SYSTEMS; AWARE; SCHEME;
D O I
10.1109/TC.2022.3223302
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The exponential growth of data in many science and engineering domains poses significant challenges to storage systems. Data distribution is a critical component in large-scale distributed storage systems and plays a vital role in placing petabytes of data and beyond, among tens to hundreds of thousands of storage devices. Meantime, heterogeneous storage systems, such as those having devices with hard disk drives (HDDs) and storage class memories (SCMs), have become increasingly popular for massive data storage due to their distinct and complement characteristics. This paper presents a new data distribution algorithm called SUORA (Scalable and Uniform storage via Optimally-adaptive and Random number Addressing) specifically for heterogeneous devices to maximize the benefits of them. SUORA provides a fully symmetric, highly efficient methodology to distribute data across a hybrid and tiered storage cluster. It divides heterogeneous devices into different buckets and segments, and adopts pseudo-random functions to map data onto them with the balanced consideration of capacity, performance and life-time. By analyzing hotness and access patterns, SUORA gradually moves hot data from HDDs to SCMs to optimize the throughput, and moves cold data reversely for load balance. It combines data replication with migration to significantly reduce movement overhead while making data placement more adaptive to different workloads. Extensive evaluations on simulation and Sheepdog storage system show that, with considering distinct characteristics of various devices thoroughly, SUORA improves the overall performance efficiency of heterogeneous storage systems.
引用
收藏
页码:1747 / 1762
页数:16
相关论文
共 50 条
  • [41] Deterministic Data Distribution for Efficient Recovery in Erasure-Coded Storage Systems
    Xu, Liangliang
    Lyu, Min
    Li, Zhipeng
    Li, Yongkun
    Xu, Yinlong
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (10) : 2248 - 2262
  • [42] Capacity and Security of Heterogeneous Distributed Storage Systems
    Ernvall, Toni
    El Rouayheb, Salim
    Hollanti, Camilla
    Poor, H. Vincent
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2013, 31 (12) : 2701 - 2709
  • [43] Capacity and Security of Heterogeneous Distributed Storage Systems
    Ernvall, Toni
    El Rouayheb, Salim
    Hollanti, Camilla
    Poor, H. Vincent
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2013, : 1247 - +
  • [44] Distribution of dose in the radiolysis of heterogeneous systems
    LaVerne, Jay A.
    Pinto, Brendan
    Pimblott, Simon M.
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2006, 231
  • [45] Distribution of Agents in Heterogeneous Multiagent Systems
    Abbas, Waseem
    Egerstedt, Magnus
    [J]. 2011 50TH IEEE CONFERENCE ON DECISION AND CONTROL AND EUROPEAN CONTROL CONFERENCE (CDC-ECC), 2011, : 976 - 981
  • [46] Data protection in heterogeneous big data systems
    M. A. Poltavtseva
    E. B. Aleksandrova
    V. S. Shmatov
    P. D. Zegzhda
    [J]. Journal of Computer Virology and Hacking Techniques, 2023, 19 : 451 - 458
  • [47] Data protection in heterogeneous big data systems
    Poltavtseva, M. A.
    Aleksandrova, E. B.
    Shmatov, V. S.
    Zegzhda, P. D.
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2023, 19 (03) : 451 - 458
  • [48] Data storage and data re-use in taxonomy—the need for improved storage and accessibility of heterogeneous data
    Birgit Gemeinholzer
    Miguel Vences
    Bank Beszteri
    Teddy Bruy
    Janine Felden
    Ivaylo Kostadinov
    Aurélien Miralles
    Tim W. Nattkemper
    Christian Printzen
    Jasmin Renz
    Nataliya Rybalka
    Tanja Schuster
    Tanja Weibulat
    Thomas Wilke
    Susanne S. Renner
    [J]. Organisms Diversity & Evolution, 2020, 20 : 1 - 8
  • [49] Storage and retrieval of massive heterogeneous IoT data based on hybrid storage
    Wu, Shanshan
    Bao, Liang
    Zhu, Zisheng
    Yi, Fan
    Chen, Weizhao
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2982 - 2987
  • [50] Storage-aware caching: Revisiting caching for heterogeneous storage systems
    Forney, BC
    Arpaci-Dusseau, AC
    Arpaci-Dusseau, RH
    [J]. USENIX ASSOCIATION PROCEEDINGS OF THE FAST'02 CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2002, : 61 - 74