Offline Selective Data Deduplication for Primary Storage Systems

被引:1
|
作者
Park, Sejin [1 ]
Park, Chanik [1 ]
机构
[1] POSTECH, Pohang 790784, South Korea
来源
关键词
data deduplication; selective deduplication; rank based deduplication;
D O I
10.1587/transinf.2015EDP7034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.
引用
收藏
页码:370 / 382
页数:13
相关论文
共 50 条
  • [1] Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud
    Mao, Bo
    Jiang, Hong
    Wu, Suzhen
    Tian, Lei
    IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (06) : 1775 - 1788
  • [2] Data deduplication mechanism for cloud storage systems
    Xu, Xiaolong
    Tu, Qun
    2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 286 - 294
  • [3] Deduplication in unstructured-data storage systems
    Tolic, Andrej
    Brodnik, Andrej
    ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2015, 82 (05): : 233 - 242
  • [4] A Study on Data Deduplication in HPC Storage Systems
    Meister, Dirk
    Kaiser, Juergen
    Brinkmann, Andre
    Cortes, Toni
    Kuhn, Michael
    Kunkel, Julian
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [5] Genetic Optimized Data Deduplication for Distributed Big Data Storage Systems
    Kumar, Naresh
    Antwal, Shobha
    Samarthyam, Ganesh
    Jain, S. C.
    PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K17), 2017, : 7 - 15
  • [6] A Survey of Secure Data Deduplication Schemes for Cloud Storage Systems
    Shin, Youngjoo
    Koo, Dongyoung
    Hur, Junbeom
    ACM COMPUTING SURVEYS, 2017, 49 (04)
  • [7] DIODE: Dynamic Inline-Offline DEduplication Providing Efficient Space-saving and Read/Write Performance for Primary Storage Systems
    Tang, Yan
    Yin, Jianwei
    Deng, Shuiguang
    Li, Ying
    2016 IEEE 24TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2016, : 481 - 486
  • [8] Using Elasticity to Improve Inline Data Deduplication Storage Systems
    Wang, Yufeng
    Tan, Chiu C.
    Mi, Ningfang
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 785 - 792
  • [9] DBLK: Deduplication for Primary Block Storage
    Tsuchiya, Yoshihiro
    Watanabe, Takashi
    2011 IEEE 27TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2011,
  • [10] POD: Performance Oriented I/O Deduplication for Primary Storage Systems in the Cloud
    Mao, Bo
    Jiang, Hong
    Wu, Suzhen
    Tian, Lei
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,