Offline Selective Data Deduplication for Primary Storage Systems

被引:1
|
作者
Park, Sejin [1 ]
Park, Chanik [1 ]
机构
[1] POSTECH, Pohang 790784, South Korea
来源
关键词
data deduplication; selective deduplication; rank based deduplication;
D O I
10.1587/transinf.2015EDP7034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.
引用
收藏
页码:370 / 382
页数:13
相关论文
共 50 条
  • [31] A survey on novel classification of deduplication storage systems
    Shawgi M. A. Mohamed
    Yongli Wang
    Distributed and Parallel Databases, 2021, 39 : 201 - 230
  • [32] Reducing the Storage Burden via Data Deduplication
    Geer, David
    COMPUTER, 2008, 41 (12) : 15 - 17
  • [33] Survey on Data Deduplication in Cloud Storage Environments
    Kim, Won-Bin
    Lee, Im-Yeong
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (03): : 658 - 673
  • [34] A Secure Data Deduplication Scheme for Cloud Storage
    Stanek, Jan
    Sorniotti, Alessandro
    Androulaki, Elli
    Kencl, Lukas
    FINANCIAL CRYPTOGRAPHY AND DATA SECURITY, FC 2014, 2014, 8437 : 99 - 118
  • [35] A Study on Data Deduplication Techniques for Optimized Storage
    Manogar, E.
    Abirami, S.
    2014 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, 2014, : 161 - 166
  • [36] Secure Deduplication Storage Systems with Keyword Search
    Li, Jin
    Chen, Xiaofeng
    Xhafa, Fatos
    Barolli, Leonard
    2014 IEEE 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2014, : 971 - 977
  • [37] A Simulation Analysis of Redundancy and Reliability in Primary Storage Deduplication
    Fu, Min
    Han, Shujie
    Lee, Patrick P. C.
    Feng, Dan
    Chen, Zuoning
    Xiao, Yu
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (09) : 1259 - 1272
  • [38] The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems
    Xia, Wen
    Zou, Xiangyu
    Jiang, Hong
    Zhou, Yukun
    Liu, Chuanyi
    Feng, Dan
    Hua, Yu
    Hu, Yuchong
    Zhang, Yucheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (09) : 2017 - 2031
  • [39] Exploiting the Data Redundancy Locality to Improve the Performance of Deduplication-based Storage Systems
    Wu, Suzhen
    Chen, Xiao
    Mao, Bo
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 527 - 534
  • [40] File Semantic Aware Primary Storage Deduplication System
    Godavari, Amdewar
    Sudhakar, Chapram
    Ramesh, T.
    IETE JOURNAL OF RESEARCH, 2023, 69 (11) : 7945 - 7957