A new content-defined chunking algorithm for data deduplication in cloud storage

被引:36
|
作者
Widodo, Ryan N. S. [1 ]
Lim, Hyotaek [2 ]
Atiquzzaman, Mohammed [3 ]
机构
[1] Dongseo Univ, Dept Ubiquitous IT, Busan 617716, South Korea
[2] Dongseo Univ, Div Comp Engn, Busan 617716, South Korea
[3] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
基金
新加坡国家研究基金会;
关键词
Data deduplication; Cloud storage; Content-defined chunking; Hash-less chunking; Asymmetric window;
D O I
10.1016/j.future.2017.02.013
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Chunking is a process to split a file into smaller files called chunks. In some applications, such as remote data compression, data synchronization, and data deduplication, chunking is important because it determines the duplicate detection performance of the system. Content-defined chunking (CDC) is a method to split files into variable length chunks, where the cut points are defined by some internal features of the files. Unlike fixed-length chunks, variable-length chunks are more resistant to byte shifting. Thus, it increases the probability of finding duplicate chunks within a file and between files. However, CDC algorithms require additional computation to find the cut points which might be computationally expensive for some applications. In our previous work (Widodo et al., 2016), the hash-based CDC algorithm used in the system took more process time than other processes in the deduplication system. This paper proposes a high throughput hash-less chunking method called Rapid Asymmetric Maximum (RAM). Instead of using hashes, RAM uses bytes value to declare the cut points. The algorithm utilizes a fix-sized window and a variable-sized window to find a maximum-valued byte which is the cut point. The maximum-valued byte is included in the chunk and located at the boundary of the chunk. This configuration allows RAM to do fewer comparisons while retaining the CDC property. We compared RAM with existing hash-based and hash-less deduplication systems. The experimental results show that our proposed algorithm has higher throughput and bytes saved per second compared to other chunking algorithms. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:145 / 156
页数:12
相关论文
共 50 条
  • [41] ClouDedup: Secure Deduplication with Encrypted Data for Cloud Storage
    Puzio, Pasquale
    Molva, Refik
    Oenen, Melek
    Loureiro, Sergio
    2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1, 2013, : 363 - 370
  • [42] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
    Luo, Shengmei
    Zhang, Guangyan
    Wu, Chengwen
    Khan, Samee U.
    Li, Keqin
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
  • [43] Verifiable Secure Data Deduplication Method in Cloud Storage
    Xian H.-Q.
    Liu H.-Y.
    Zhang S.-G.
    Hou R.-T.
    Xian, He-Qun (xianhq@126.com), 1600, Chinese Academy of Sciences (31): : 455 - 470
  • [44] A Data Structure for Efficient File Deduplication in Cloud Storage
    Wang, Bohui
    Li, Hui
    Zhao, Yan
    Yang, Xin
    Ma, Huajun
    Xie, Xin
    Xing, Kaixuan
    2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 71 - 77
  • [45] Heterogeneous Data Storage Management with Deduplication in Cloud Computing
    Yan, Zheng
    Zhang, Lifang
    Ding, Wenxiu
    Zheng, Qinghua
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (03) : 393 - 407
  • [46] Group provable data possession with deduplication in cloud storage
    Wang H.-Y.
    Zhu L.-H.
    Li L.-Y.-J.
    Ruan Jian Xue Bao/Journal of Software, 2016, 27 (06): : 1417 - 1431
  • [47] A secure framework for managing data in cloud storage using rapid asymmetric maximum based dynamic size chunking and fuzzy logic for deduplication
    Rajkumar, K.
    Hariharan, U.
    Dhanakoti, V.
    Muthukumaran, N.
    WIRELESS NETWORKS, 2024, 30 (01) : 321 - 334
  • [48] A secure framework for managing data in cloud storage using rapid asymmetric maximum based dynamic size chunking and fuzzy logic for deduplication
    K. Rajkumar
    U. Hariharan
    V. Dhanakoti
    N. Muthukumaran
    Wireless Networks, 2024, 30 : 321 - 334
  • [49] Secure Data Deduplication System with Tag Consistency in Cloud Data Storage
    Patil, Pramod Gorakh
    Dixit, Aditya Rajesh
    Sharma, Aman
    Mahale, Prashant Rajendra
    Jadhav, Mayur Pundlik
    INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES (ICCNCT 2018), 2019, 15 : 119 - 124
  • [50] UCDC: Unlimited Content-Defined Chunking, A File-Differing Method Apply to File-Synchronization among Multiple Hosts
    Ma, Jihong
    Bi, Chongguang
    Bai, Yuebin
    Zhang, Lijun
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2016, : 76 - 82