Offline Selective Data Deduplication for Primary Storage Systems

被引:1
|
作者
Park, Sejin [1 ]
Park, Chanik [1 ]
机构
[1] POSTECH, Pohang 790784, South Korea
来源
关键词
data deduplication; selective deduplication; rank based deduplication;
D O I
10.1587/transinf.2015EDP7034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.
引用
收藏
页码:370 / 382
页数:13
相关论文
共 50 条
  • [21] Efficient Deduplication in a Distributed Primary Storage Infrastructure
    Paulo, Joao
    Pereira, Jose
    ACM TRANSACTIONS ON STORAGE, 2016, 12 (04)
  • [22] Even Data Placement for Load Balance in Reliable Distributed Deduplication Storage Systems
    Xu, Min
    Zhu, Yunfeng
    Lee, Patrick P. C.
    Xu, Yinlong
    2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2015, : 349 - 358
  • [23] CLUSTERED OUTBAND DEDUPLICATION ON PRIMARY DATA
    Agrawal, Archana Satynarayan
    Malhotra, Jyoti
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 446 - 450
  • [24] Secure and Efficient Data Deduplication in JointCloud Storage
    Zhang, Di
    Le, Junqing
    Mu, Nankun
    Wu, Jiahui
    Liao, Xiaofeng
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 156 - 167
  • [25] Data Deduplication for Efficient Cloud Storage and Retrieval
    Misal, Rishikesh
    Perumal, Boominathan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927
  • [26] Improving Data Availability for Deduplication in Cloud Storage
    Li, Jun
    Hou, Mengshu
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (02) : 70 - 89
  • [27] Analysis of Energy Consumption of Deduplication in Storage Systems
    Yan, Yizhou
    Wu, Wenjun
    2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 295 - 301
  • [28] RESEARCH OF NETWORK STORAGE BASED ON DATA DEDUPLICATION
    Zhang, Wei
    Wang, Huajun
    Lu, Hanyu
    Huang, Wei
    2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 3, 2012, : 555 - 559
  • [29] Deduplication scheme with data popularity for cloud storage
    He X.
    Yang Q.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2024, 51 (01): : 187 - 200
  • [30] A survey on novel classification of deduplication storage systems
    Mohamed, Shawgi M. A.
    Wang, Yongli
    DISTRIBUTED AND PARALLEL DATABASES, 2021, 39 (01) : 201 - 230