Even Data Placement for Load Balance in Reliable Distributed Deduplication Storage Systems

被引:0
|
作者
Xu, Min [1 ]
Zhu, Yunfeng [2 ]
Lee, Patrick P. C. [1 ]
Xu, Yinlong [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Hom, Hong Kong, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Modern distributed storage systems often deploy deduplication to remove content-level redundancy and hence improve storage efficiency. However, deduplication inevitably leads to unbalanced data placement across storage nodes, thereby degrading read performance. This paper studies the load balance problem in the setting of a reliable distributed deduplication storage system, which deploys deduplication for storage efficiency and erasure coding for reliability. We argue that in such a setting, it is generally challenging to find a data placement that simultaneously achieves both read balance and storage balance objectives. To this end, we formulate a combinatorial optimization problem, and propose a greedy, polynomial-time Even Data Placement (EDP) algorithm, which identifies a data placement that effectively achieves read balance while maintaining storage balance. We further extend our EDP algorithm to heterogeneous environments. We demonstrate the effectiveness of our EDP algorithm under real-world workloads using both extensive simulations and prototype testbed experiments. In particular, our testbed experiments show that our EDP algorithm reduces the file read time by 37.41% compared to the baseline round-robin placement, and the reduction can further reach 52.11% in a heterogeneous setting.
引用
收藏
页码:349 / 358
页数:10
相关论文
共 50 条
  • [1] Genetic Optimized Data Deduplication for Distributed Big Data Storage Systems
    Kumar, Naresh
    Antwal, Shobha
    Samarthyam, Ganesh
    Jain, S. C.
    [J]. PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K17), 2017, : 7 - 15
  • [2] A framework for reliable and efficient data placement in distributed computing systems
    Kosar, T
    Livny, M
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (10) : 1146 - 1157
  • [3] Data Placement Strategy in Data Center Distributed Storage Systems
    Qin, Yang
    Ai, Xiao
    Chen, Lingjian
    Yang, Weihong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS (ICCS), 2016,
  • [4] A guideline for data placement in heterogeneous distributed storage systems
    Kaneko, Shun
    Nakamura, Takaki
    Kamei, Hitoshi
    Muraoka, Hiroaki
    [J]. PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, : 942 - 945
  • [5] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
    Luo, Shengmei
    Zhang, Guangyan
    Wu, Chengwen
    Khan, Samee U.
    Li, Keqin
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
  • [6] Improving Storage Capacity by Distributed Exact Deduplication Systems
    Barca, Cristian
    Barca, Dan Claudiu
    Mara, Constantin
    Anghelescu, Petre
    Gavriloaia, Bogdan
    Vizireanu, Radu
    Craciunescu, Razvan
    Fratu, Octavian
    [J]. PROCEEDINGS OF THE 2015 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2015, : C11 - C16
  • [7] Data Replica Placement Policy Based on Load Balance in Cloud Storage System
    Fu, Xiong
    Li, Jian
    Liu, Wenjie
    Deng, Song
    Wang, Junchang
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 682 - 685
  • [8] Data deduplication mechanism for cloud storage systems
    Xu, Xiaolong
    Tu, Qun
    [J]. 2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 286 - 294
  • [9] Deduplication in unstructured-data storage systems
    Tolic, Andrej
    Brodnik, Andrej
    [J]. ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2015, 82 (05): : 233 - 242
  • [10] A Study on Data Deduplication in HPC Storage Systems
    Meister, Dirk
    Kaiser, Juergen
    Brinkmann, Andre
    Cortes, Toni
    Kuhn, Michael
    Kunkel, Julian
    [J]. 2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,