A new content-defined chunking algorithm for data deduplication in cloud storage

被引:36
|
作者
Widodo, Ryan N. S. [1 ]
Lim, Hyotaek [2 ]
Atiquzzaman, Mohammed [3 ]
机构
[1] Dongseo Univ, Dept Ubiquitous IT, Busan 617716, South Korea
[2] Dongseo Univ, Div Comp Engn, Busan 617716, South Korea
[3] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
基金
新加坡国家研究基金会;
关键词
Data deduplication; Cloud storage; Content-defined chunking; Hash-less chunking; Asymmetric window;
D O I
10.1016/j.future.2017.02.013
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Chunking is a process to split a file into smaller files called chunks. In some applications, such as remote data compression, data synchronization, and data deduplication, chunking is important because it determines the duplicate detection performance of the system. Content-defined chunking (CDC) is a method to split files into variable length chunks, where the cut points are defined by some internal features of the files. Unlike fixed-length chunks, variable-length chunks are more resistant to byte shifting. Thus, it increases the probability of finding duplicate chunks within a file and between files. However, CDC algorithms require additional computation to find the cut points which might be computationally expensive for some applications. In our previous work (Widodo et al., 2016), the hash-based CDC algorithm used in the system took more process time than other processes in the deduplication system. This paper proposes a high throughput hash-less chunking method called Rapid Asymmetric Maximum (RAM). Instead of using hashes, RAM uses bytes value to declare the cut points. The algorithm utilizes a fix-sized window and a variable-sized window to find a maximum-valued byte which is the cut point. The maximum-valued byte is included in the chunk and located at the boundary of the chunk. This configuration allows RAM to do fewer comparisons while retaining the CDC property. We compared RAM with existing hash-based and hash-less deduplication systems. The experimental results show that our proposed algorithm has higher throughput and bytes saved per second compared to other chunking algorithms. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:145 / 156
页数:12
相关论文
共 50 条
  • [31] Health Data Deduplication Using Window Chunking-Signature Encryption in Cloud
    Neelamegam, G.
    Marikkannu, P.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (01): : 1079 - 1093
  • [32] Research on cloud storage biological data deduplication method based on Simhash algorithm
    Du, Haijuan
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2023, 27 (04) : 252 - 266
  • [33] A generic integrity verification algorithm of version files for cloud deduplication data storage
    Xu, Guangwei
    Lai, Miaolin
    Li, Jing
    Sun, Li
    Shi, Xiujin
    EURASIP JOURNAL ON INFORMATION SECURITY, 2018,
  • [34] Improving Data Availability for Deduplication in Cloud Storage
    Li, Jun
    Hou, Mengshu
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (02) : 70 - 89
  • [35] Data Deduplication for Efficient Cloud Storage and Retrieval
    Misal, Rishikesh
    Perumal, Boominathan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927
  • [36] Data deduplication mechanism for cloud storage systems
    Xu, Xiaolong
    Tu, Qun
    2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 286 - 294
  • [37] Deduplication scheme with data popularity for cloud storage
    He X.
    Yang Q.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2024, 51 (01): : 187 - 200
  • [38] Survey on Data Deduplication in Cloud Storage Environments
    Kim, Won-Bin
    Lee, Im-Yeong
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (03): : 658 - 673
  • [39] A Secure Data Deduplication Scheme for Cloud Storage
    Stanek, Jan
    Sorniotti, Alessandro
    Androulaki, Elli
    Kencl, Lukas
    FINANCIAL CRYPTOGRAPHY AND DATA SECURITY, FC 2014, 2014, 8437 : 99 - 118
  • [40] A Proposal for Improving Data DeDuplication with Dual Side Fixed Size Chunking Algorithm
    Krishnaprasad, P. K.
    Narayamparambil, Biju Abraham
    2013 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS (ICACC 2013), 2013, : 13 - 16