Data Distribution for Heterogeneous Storage Systems

被引:4
|
作者
Zhou, Jiang [1 ]
Chen, Yong [2 ]
Zheng, Mai [3 ]
Wang, Weiping [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100864, Peoples R China
[2] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[3] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
Performance evaluation; Clustering algorithms; Nonvolatile memory; Costs; Throughput; Hash functions; Servers; Parallel/distributed file systems; data distribution; data placement; heterogeneous storage; data replication; PARALLEL FILE-SYSTEMS; AWARE; SCHEME;
D O I
10.1109/TC.2022.3223302
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The exponential growth of data in many science and engineering domains poses significant challenges to storage systems. Data distribution is a critical component in large-scale distributed storage systems and plays a vital role in placing petabytes of data and beyond, among tens to hundreds of thousands of storage devices. Meantime, heterogeneous storage systems, such as those having devices with hard disk drives (HDDs) and storage class memories (SCMs), have become increasingly popular for massive data storage due to their distinct and complement characteristics. This paper presents a new data distribution algorithm called SUORA (Scalable and Uniform storage via Optimally-adaptive and Random number Addressing) specifically for heterogeneous devices to maximize the benefits of them. SUORA provides a fully symmetric, highly efficient methodology to distribute data across a hybrid and tiered storage cluster. It divides heterogeneous devices into different buckets and segments, and adopts pseudo-random functions to map data onto them with the balanced consideration of capacity, performance and life-time. By analyzing hotness and access patterns, SUORA gradually moves hot data from HDDs to SCMs to optimize the throughput, and moves cold data reversely for load balance. It combines data replication with migration to significantly reduce movement overhead while making data placement more adaptive to different workloads. Extensive evaluations on simulation and Sheepdog storage system show that, with considering distinct characteristics of various devices thoroughly, SUORA improves the overall performance efficiency of heterogeneous storage systems.
引用
收藏
页码:1747 / 1762
页数:16
相关论文
共 50 条
  • [1] SUORA: A Scalable and Uniform Data Distribution Algorithm for Heterogeneous Storage Systems
    Zhou, Jiang
    Xie, Wei
    Noble, Jason
    Echo, Kace
    Chen, Yong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON NETWORKING ARCHITECTURE AND STORAGE (NAS), 2016,
  • [2] Data Migration in Heterogeneous Storage Systems
    Kari, Chadi
    Kim, Yoo-Ah
    Russell, Alexander
    [J]. 31ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2011), 2011, : 143 - 150
  • [3] Hashing Based Data Distribution in Heterogeneous Storage
    Zhou, Jiang
    Su, Lin
    Wang, Weiping
    Chen, Yong
    [J]. 19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 652 - 659
  • [4] A guideline for data placement in heterogeneous distributed storage systems
    Kaneko, Shun
    Nakamura, Takaki
    Kamei, Hitoshi
    Muraoka, Hiroaki
    [J]. PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, : 942 - 945
  • [5] Two-Mode Data Distribution Scheme for Heterogeneous Storage in Data Centers
    Xie, Wei
    Zhou, Jiang
    Reyes, Mark
    Noble, Jason
    Chen, Yong
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 327 - 332
  • [6] Data allocation and load balancing for heterogeneous cluster storage systems
    Perez, JM
    Garcia, F
    Carretero, J
    Calderon, A
    Sanchez, LM
    [J]. CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2003, : 718 - 723
  • [7] Optimal Data Placement for Heterogeneous Cache, Memory, and Storage Systems
    Zhang, Lei
    Karimi, Reza
    Ahmad, Irfan
    Vigfusson, Ymir
    [J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2020, 4 (01)
  • [8] Data Replica Placement Mechanism for Open Heterogeneous Storage Systems
    Xu, X.
    Yang, C.
    Shao, J.
    [J]. 8TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2017) AND THE 7TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT 2017), 2017, 109 : 18 - 25
  • [9] A Heterogeneous Cloud Storage Platform With Uniform Data Distribution by Software-Defined Storage Technologies
    Yang, Chao-Tung
    Chen, Shuo-Tsung
    Cheng, Wei-Hsun
    Chan, Yu-Wei
    Kristiani, Endah
    [J]. IEEE ACCESS, 2019, 7 : 147672 - 147682
  • [10] Work Distribution of Data-Parallel Applications on Heterogeneous Systems
    Memeti, Suejb
    Pllana, Sabri
    [J]. HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 69 - 81