Offline Selective Data Deduplication for Primary Storage Systems

被引:1
|
作者
Park, Sejin [1 ]
Park, Chanik [1 ]
机构
[1] POSTECH, Pohang 790784, South Korea
来源
关键词
data deduplication; selective deduplication; rank based deduplication;
D O I
10.1587/transinf.2015EDP7034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.
引用
收藏
页码:370 / 382
页数:13
相关论文
共 50 条
  • [41] PASCOINFOG/PASFOG: Privacy-Preserving Data Deduplication Algorithms for Fog Storage Systems
    Pooranian, Zahra
    Shojafar, Mohammad
    Taheri, Rahim
    Tafazolli, Rahim
    IEEE CONSUMER ELECTRONICS MAGAZINE, 2025, 14 (01) : 37 - 45
  • [42] Efficient cross user Data Deduplication in Remote Data Storage
    Prajapati, Priteshkumar
    Shah, Parth
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [43] Characterizing the Efficiency of Data Deduplication for Big Data Storage Management
    Zhou, Ruijin
    Liu, Ming
    Li, Tao
    2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 98 - 108
  • [44] Data Deduplication in Cloud Computing Systems
    Shang, Yingdan
    Li, Huiba
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 483 - 486
  • [45] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
    Luo, Shengmei
    Zhang, Guangyan
    Wu, Chengwen
    Khan, Samee U.
    Li, Keqin
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
  • [46] ClouDedup: Secure Deduplication with Encrypted Data for Cloud Storage
    Puzio, Pasquale
    Molva, Refik
    Oenen, Melek
    Loureiro, Sergio
    2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1, 2013, : 363 - 370
  • [47] A Storage Solution for Multimedia Files to Support Data Deduplication
    Wang, Shuai
    Du, Jianhai
    Wu, Jifang
    Wang, Ronghe
    Lv, Jianghua
    Ma, Shilong
    PROCEEDINGS OF 2016 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS (CCIOT), 2016, : 78 - 84
  • [48] Improving Storage Capacity by Distributed Exact Deduplication Systems
    Barca, Cristian
    Barca, Dan Claudiu
    Mara, Constantin
    Anghelescu, Petre
    Gavriloaia, Bogdan
    Vizireanu, Radu
    Craciunescu, Razvan
    Fratu, Octavian
    PROCEEDINGS OF THE 2015 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2015, : C11 - C16
  • [49] Verifiable Secure Data Deduplication Method in Cloud Storage
    Xian H.-Q.
    Liu H.-Y.
    Zhang S.-G.
    Hou R.-T.
    Xian, He-Qun (xianhq@126.com), 1600, Chinese Academy of Sciences (31): : 455 - 470
  • [50] Non-volatile Storage Support for Data Deduplication
    Hua, Yu
    2014 IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA), 2014,