SeqCDC: Hashless Content-Defined Chunking for Data Deduplication

被引:0
|
作者
Udayashankar, Sreeharsha [1 ]
Baba, Abdelrahman [1 ]
Al-Kiswany, Samer [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Data Chunking; Content-Defined Chunking; Content-Defined Skipping; Data Deduplication; HIGH-PERFORMANCE; ALGORITHM;
D O I
10.1145/3652892.3700766
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data deduplication is critical to cloud storage providers and is widely employed to conserve server-side storage space. Data chunking is an important aspect of deduplication, being directly responsible for storage space savings and end-to-end system throughput. While deduplication systems deployed in production favor larger chunk sizes, existing data chunking algorithms are slow and offer minimal throughput increases with increasing chunk size. We present SeqCDC, a chunking algorithm that leverages contentbased data skipping and lightweight boundary judgement to improve chunking throughputs. SeqCDC's chunking throughput is higher at larger chunk sizes. Our evaluation shows that SeqCDC can improve chunking throughput by 1.5x - 3.1x over the stateof-the-art while achieving similar space savings benefits, across a variety of datasets.
引用
收藏
页码:292 / 298
页数:7
相关论文
共 50 条
  • [1] Accelerating Content-Defined Chunking for Data Deduplication Based on Speculative Jump
    Jin, Xiaozhong
    Liu, Haikun
    Ye, Chencheng
    Liao, Xiaofei
    Jin, Hai
    Zhang, Yu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (09) : 2568 - 2579
  • [2] A new content-defined chunking algorithm for data deduplication in cloud storage
    Widodo, Ryan N. S.
    Lim, Hyotaek
    Atiquzzaman, Mohammed
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 71 : 145 - 156
  • [3] FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication
    Xia, Wen
    Zhou, Yukun
    Jiang, Hong
    Feng, Dan
    Hua, Yu
    Hu, Yuchong
    Zhang, Yucheng
    Liu, Qing
    PROCEEDINGS OF USENIX ATC '16: 2016 USENIX ANNUAL TECHNICAL CONFERENCE, 2016, : 101 - 114
  • [4] A smart hybrid content-defined chunking algorithm for data deduplication in cloud storage
    Ellappan, Manogar
    Murugappan, Abirami
    SOFT COMPUTING, 2023, 28 (15-16) : 9037 - 9052
  • [5] The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems
    Xia, Wen
    Zou, Xiangyu
    Jiang, Hong
    Zhou, Yukun
    Liu, Chuanyi
    Feng, Dan
    Hua, Yu
    Hu, Yuchong
    Zhang, Yucheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (09) : 2017 - 2031
  • [6] Implementing Content-Defined Chunking for Deduplication in Host-Managed SSDs
    Chen, Che-Min
    Shih, Yi-Chao
    Liu, Xin
    Shih, Wei-Kuan
    Chen, Tseng-Yi
    2024 IEEE THE 20TH ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, APCCAS 2024, 2024, : 159 - 163
  • [7] Data Deduplication System Based on Content-Defined Chunking Using Bytes Pair Frequency Occurrence
    Saeed, Ahmed Sardar M.
    George, Loay E.
    SYMMETRY-BASEL, 2020, 12 (11): : 1 - 21
  • [8] Blockchain-based data deduplication using novel content-defined chunking algorithm in cloud environment
    Prakash, J. Jabin
    Ramesh, K.
    Saravanan, K.
    Prabha, G. Lakshmi
    INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, 2023,
  • [9] Blockchain-based data deduplication using novel content-defined chunking algorithm in cloud environment
    Prakash, Jabin J.
    Ramesh, K.
    Saravanan, K.
    Prabha, Lakshmi G.
    INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, 2023, 33 (06)
  • [10] SuperCDC: A Hybrid Design of High-Performance Content-Defined Chunking for Fast Deduplication
    Wan, Binzhaoshuo
    Pu, Lifeng
    Zou, Xiangyu
    Li, Shiyi
    Wang, Peng
    Xia, Wen
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 170 - 178