Even Data Placement for Load Balance in Reliable Distributed Deduplication Storage Systems

被引:0
|
作者
Xu, Min [1 ]
Zhu, Yunfeng [2 ]
Lee, Patrick P. C. [1 ]
Xu, Yinlong [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Hom, Hong Kong, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Modern distributed storage systems often deploy deduplication to remove content-level redundancy and hence improve storage efficiency. However, deduplication inevitably leads to unbalanced data placement across storage nodes, thereby degrading read performance. This paper studies the load balance problem in the setting of a reliable distributed deduplication storage system, which deploys deduplication for storage efficiency and erasure coding for reliability. We argue that in such a setting, it is generally challenging to find a data placement that simultaneously achieves both read balance and storage balance objectives. To this end, we formulate a combinatorial optimization problem, and propose a greedy, polynomial-time Even Data Placement (EDP) algorithm, which identifies a data placement that effectively achieves read balance while maintaining storage balance. We further extend our EDP algorithm to heterogeneous environments. We demonstrate the effectiveness of our EDP algorithm under real-world workloads using both extensive simulations and prototype testbed experiments. In particular, our testbed experiments show that our EDP algorithm reduces the file read time by 37.41% compared to the baseline round-robin placement, and the reduction can further reach 52.11% in a heterogeneous setting.
引用
收藏
页码:349 / 358
页数:10
相关论文
共 50 条
  • [41] A load balance transfer method in a distributed data stream system
    Du Dongming
    Wang Dan
    Li Mao Zeng
    [J]. Advanced Computer Technology, New Education, Proceedings, 2007, : 327 - 331
  • [42] A Distributed Load Balance Algorithm of MapReduce for Data Quality Detection
    Gao, Yitong
    Zhang, Yan
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2016, 2016, 9645 : 294 - 306
  • [43] On the impact of replica placement to the reliability of distributed brick storage systems
    Lian, Q
    Chen, W
    Zhang, Z
    [J]. 25th IEEE International Conference on Distributed Computing Systems, Proceedings, 2005, : 187 - 196
  • [44] Coded Data Rebalancing for Distributed Data Storage Systems with Cyclic Storage
    Chandramouli, Athreya
    Vaishya, Abhinav
    Krishnan, Prasad
    [J]. 2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 618 - 623
  • [45] Distributed Data Analysis and Reliable Operation of Cyberphysical Systems
    Wolf, Marilyn
    [J]. COMPUTER, 2020, 53 (03) : 14 - 15
  • [46] AR-RRNS: Configurable reliable distributed data storage systems for Internet of Things to ensure security
    Chervyakov, Nikolay
    Babenko, Mikhail
    Tchernykh, Andrei
    Kucherov, Nikolay
    Miranda-Lopez, Vanessa
    Cortes-Mendoza, Jorge M.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 92 : 1080 - 1092
  • [47] DATA PLACEMENT AND MIGRATION STRATEGIES FOR VIRTUALISED DATA STORAGE SYSTEMS
    Bond, H. A.
    Dingle, N. J.
    Franciosi, F.
    Harrison, P. G.
    Knottenbelt, W. J.
    [J]. EUROPEAN SIMULATION AND MODELLING CONFERENCE 2009, 2009, : 231 - 237
  • [48] Reliable Storage and Querying for Collaborative Data Sharing Systems
    Taylor, Nicholas E.
    Ives, Zachary G.
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 40 - 51
  • [49] Fault-Tolerance and Load-Balance Tradeoff in a Distributed Storage System
    Quezada Naquid, Moises
    Marcelin Jimenez, Ricardo
    Lopez Guerrero, Miguel
    [J]. COMPUTACION Y SISTEMAS, 2010, 14 (02): : 151 - 163
  • [50] Authorization of data access in distributed storage systems
    Feichtinger, D
    Peters, AJ
    [J]. 2005 6TH INTERNATIONAL WORKSHOP ON GRID COMPUTING (GRID), 2005, : 172 - 178