SeqCDC: Hashless Content-Defined Chunking for Data Deduplication

被引:0
|
作者
Udayashankar, Sreeharsha [1 ]
Baba, Abdelrahman [1 ]
Al-Kiswany, Samer [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Data Chunking; Content-Defined Chunking; Content-Defined Skipping; Data Deduplication; HIGH-PERFORMANCE; ALGORITHM;
D O I
10.1145/3652892.3700766
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data deduplication is critical to cloud storage providers and is widely employed to conserve server-side storage space. Data chunking is an important aspect of deduplication, being directly responsible for storage space savings and end-to-end system throughput. While deduplication systems deployed in production favor larger chunk sizes, existing data chunking algorithms are slow and offer minimal throughput increases with increasing chunk size. We present SeqCDC, a chunking algorithm that leverages contentbased data skipping and lightweight boundary judgement to improve chunking throughputs. SeqCDC's chunking throughput is higher at larger chunk sizes. Our evaluation shows that SeqCDC can improve chunking throughput by 1.5x - 3.1x over the stateof-the-art while achieving similar space savings benefits, across a variety of datasets.
引用
收藏
页码:292 / 298
页数:7
相关论文
共 50 条
  • [31] Function of Content Defined Chunking Algorithms in Incremental Synchronization
    Zhang, Changjian
    Qi, Deyu
    Li, Wenlin
    Guo, Jing
    IEEE ACCESS, 2020, 8 (08): : 5316 - 5330
  • [32] Health Data Deduplication Using Window Chunking-Signature Encryption in Cloud
    Neelamegam, G.
    Marikkannu, P.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (01): : 1079 - 1093
  • [33] A Proposal for Improving Data DeDuplication with Dual Side Fixed Size Chunking Algorithm
    Krishnaprasad, P. K.
    Narayamparambil, Biju Abraham
    2013 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS (ICACC 2013), 2013, : 13 - 16
  • [34] Leap-based Content Defined Chunking --- Theory and Implementation
    Yu, Chuanshuai
    Zhang, Chengwei
    Mao, Yiping
    Li, Fulu
    2015 31ST SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2015,
  • [35] Deduplication with Block-Level Content-Aware Chunking for Solid State Drives (SSDs)
    Ha, Jin-Yong
    Lee, Young-Sik
    Kim, Jin-Soo
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1982 - 1989
  • [36] Odess: Speeding up Resemblance Detection for Redundancy Elimination by Fast Content-Defined Sampling
    Zou, Xiangyu
    Deng, Cai
    Xia, Wen
    Shilane, Philip
    Tan, Haoliang
    Zhang, Haijun
    Wang, Xuan
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 480 - 491
  • [37] Does the Content Defined Chunking Really Solve the Local Boundary Shift Problem?
    Tian, Wenlong
    Li, Ruixuan
    Xu, Zhiyong
    Xiao, Weijun
    2017 IEEE 36TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2017,
  • [38] A Logistic Based Mathematical Model to Optimize Duplicate Elimination Ratio in Content Defined Chunking Based Big Data Storage System
    Wang, Longxiang
    Dong, Xiaoshe
    Zhang, Xingjun
    Guo, Fuliang
    Wang, Yinfeng
    Gong, Weifeng
    SYMMETRY-BASEL, 2016, 8 (07):
  • [39] Boosting the Profitability of NVRAM-based Storage Devices via the Concept of Dual-Chunking Data Deduplication
    Chen, Shuo-Han
    Liang, Yu-Pei
    Chang, Yuan-Hao
    Wei, Hsin-Wen
    Shih, Wei-Kuan
    2020 25TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2020, 2020, : 512 - 517
  • [40] QuickCDC: A Quick Content Defined Chunking Algorithm Based on Jumping and Dynamically Adjusting Mask Bits
    Xu, Zhen
    Zhang, Wenbo
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 288 - 299