Efficient Deduplication in a Distributed Primary Storage Infrastructure

被引:16
|
作者
Paulo, Joao [1 ]
Pereira, Jose [1 ]
机构
[1] Univ Minho, Dept Informat, Campus Gualtar, P-4710057 Braga, Portugal
关键词
Primary storage; deduplication; distributed systems;
D O I
10.1145/2876509
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A large amount of duplicate data typically exists across volumes of virtual machines in cloud computing infrastructures. Deduplication allows reclaiming these duplicates while improving the cost-effectiveness of large-scale multitenant infrastructures. However, traditional archival and backup deduplication systems impose prohibitive storage overhead for virtual machines hosting latency-sensitive applications. Primary deduplication systems reduce such penalty but rely on special cluster filesystems, centralized components, or restrictive workload assumptions. Also, some of these systems reduce storage overhead by confining deduplication to off-peak periods that may be scarce in a cloud environment. We present DEDIS, a dependable and fully decentralized system that performs cluster-wide off-line deduplication of virtual machines' primary volumes. DEDIS works on top of any unsophisticated storage backend, centralized or distributed, as long as it exports a basic shared block device interface. Also, DEDIS does not rely on data locality assumptions and incorporates novel optimizations for reducing deduplication overhead and increasing its reliability. The evaluation of an open-source prototype shows that minimal I/O overhead is achievable even when deduplication and intensive storage I/O are executed simultaneously. Also, our design scales out and allows collocating DEDIS components and virtual machines in the same servers, thus, sparing the need of additional hardware.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] Scalable, Efficient, and Policy-aware Deduplication for Primary Distributed Storage Systems
    Fingler, Henrique
    Ra, Moo-Ryong
    Panta, Rajesh
    2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 180 - 187
  • [2] Distributed Exact Deduplication for Primary Storage Infrastructures
    Paulo, Joao
    Pereira, Jose
    DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS (DAIS 2014), 2014, 8460 : 52 - 66
  • [3] When Deduplication Meets Migration: An Efficient and Adaptive Strategy in Distributed Storage Systems
    Cheng, Geyao
    Luo, Lailong
    Xia, Junxu
    Guo, Deke
    Sun, Yuchen
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (10) : 2749 - 2766
  • [4] FASR: An Efficient Feature-Aware Deduplication Method in Distributed Storage Systems
    Yao, Wenbin
    Hao, Mengyao
    Hou, Yingying
    Li, Xiaoyong
    IEEE ACCESS, 2022, 10 : 15311 - 15321
  • [5] Synchronization and Deduplication in Coded Distributed Storage Networks
    El Rouayheb, Salim
    Goparaju, Sreechakra
    Kiah, Han Mao
    Milenkovic, Olgica
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2016, 24 (05) : 3056 - 3069
  • [6] DBLK: Deduplication for Primary Block Storage
    Tsuchiya, Yoshihiro
    Watanabe, Takashi
    2011 IEEE 27TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2011,
  • [7] Secure and Efficient Data Deduplication in JointCloud Storage
    Zhang, Di
    Le, Junqing
    Mu, Nankun
    Wu, Jiahui
    Liao, Xiaofeng
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 156 - 167
  • [8] Data Deduplication for Efficient Cloud Storage and Retrieval
    Misal, Rishikesh
    Perumal, Boominathan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927
  • [9] D3: A Dynamic Dual-Phase Deduplication Framework for Distributed Primary Storage
    Yin, Jianwei
    Tang, Yan
    Deng, Shuiguang
    Li, Ying
    Zomaya, Albert Y.
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (02) : 193 - 207
  • [10] A Simulation Analysis of Reliability in Primary Storage Deduplication
    Fu, Min
    Lee, Patrick P. C.
    Feng, Dan
    Chen, Zuoning
    Xiao, Yu
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2016, : 199 - 208