Efficient Deduplication in a Distributed Primary Storage Infrastructure

被引:16
|
作者
Paulo, Joao [1 ]
Pereira, Jose [1 ]
机构
[1] Univ Minho, Dept Informat, Campus Gualtar, P-4710057 Braga, Portugal
关键词
Primary storage; deduplication; distributed systems;
D O I
10.1145/2876509
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A large amount of duplicate data typically exists across volumes of virtual machines in cloud computing infrastructures. Deduplication allows reclaiming these duplicates while improving the cost-effectiveness of large-scale multitenant infrastructures. However, traditional archival and backup deduplication systems impose prohibitive storage overhead for virtual machines hosting latency-sensitive applications. Primary deduplication systems reduce such penalty but rely on special cluster filesystems, centralized components, or restrictive workload assumptions. Also, some of these systems reduce storage overhead by confining deduplication to off-peak periods that may be scarce in a cloud environment. We present DEDIS, a dependable and fully decentralized system that performs cluster-wide off-line deduplication of virtual machines' primary volumes. DEDIS works on top of any unsophisticated storage backend, centralized or distributed, as long as it exports a basic shared block device interface. Also, DEDIS does not rely on data locality assumptions and incorporates novel optimizations for reducing deduplication overhead and increasing its reliability. The evaluation of an open-source prototype shows that minimal I/O overhead is achievable even when deduplication and intensive storage I/O are executed simultaneously. Also, our design scales out and allows collocating DEDIS components and virtual machines in the same servers, thus, sparing the need of additional hardware.
引用
收藏
页数:35
相关论文
共 50 条
  • [21] Offline Selective Data Deduplication for Primary Storage Systems
    Park, Sejin
    Park, Chanik
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (02): : 370 - 382
  • [22] Secure auditing and deduplication with efficient ownership management for cloud storage
    Wang, Min
    Xu, Lujun
    Hao, Rong
    Yang, Ming
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 142
  • [23] Nature - Inspired Enhanced Data Deduplication for Efficient Cloud Storage
    Madhubala, G.
    Priyadharshini, R.
    Ranjitham, P.
    Baskaran, Santhi
    2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
  • [24] Secure and Efficient Deduplication for Cloud Storage with Dynamic Ownership Management
    Lee, Mira
    Seo, Minhye
    APPLIED SCIENCES-BASEL, 2023, 13 (24):
  • [25] Provisioning an efficient data deduplication model for cloud storage and integrity
    Kumar, Doddi Suresh
    Srinivasu, Nulaka
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024,
  • [26] Secure cloud storage auditing with deduplication and efficient data transfer
    Yu, Jingze
    Shen, Wenting
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (02): : 2203 - 2215
  • [27] Secure cloud storage auditing with deduplication and efficient data transfer
    Jingze Yu
    Wenting Shen
    Cluster Computing, 2024, 27 : 2203 - 2215
  • [28] Efficient Integrity Auditing Mechanism With Secure Deduplication for Blockchain Storage
    Zhang, Qingyang
    Sui, Dongfang
    Cui, Jie
    Gu, Chengjie
    Zhong, Hong
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (08) : 2365 - 2376
  • [29] SEED: Enabling Serverless and Efficient Encrypted Deduplication for Cloud Storage
    Shin, Youngjoo
    Koo, Dongyoung
    Yun, Joobeom
    Hur, Junbeom
    2016 8TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2016), 2016, : 482 - 487
  • [30] Efficient cross user Data Deduplication in Remote Data Storage
    Prajapati, Priteshkumar
    Shah, Parth
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,