Offline Selective Data Deduplication for Primary Storage Systems

被引：1

作者：

Park, Sejin ^{[1
]}

Park, Chanik ^{[1
]}

机构：

[1] POSTECH, Pohang 790784, South Korea

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2016年 / E99D卷 / 02期

关键词：

data deduplication; selective deduplication; rank based deduplication;

D O I：

10.1587/transinf.2015EDP7034

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.

引用

页码：370 / 382

页数：13

共 50 条

[21] Efficient Deduplication in a Distributed Primary Storage Infrastructure
Paulo, Joao
Pereira, Jose
ACM TRANSACTIONS ON STORAGE, 2016, 12 (04)
[22] Even Data Placement for Load Balance in Reliable Distributed Deduplication Storage Systems
Xu, Min
Zhu, Yunfeng
Lee, Patrick P. C.
Xu, Yinlong
2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2015, : 349 - 358
[23] CLUSTERED OUTBAND DEDUPLICATION ON PRIMARY DATA
Agrawal, Archana Satynarayan
Malhotra, Jyoti
1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 446 - 450
[24] Secure and Efficient Data Deduplication in JointCloud Storage
Zhang, Di
Le, Junqing
Mu, Nankun
Wu, Jiahui
Liao, Xiaofeng
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 156 - 167
[25] Data Deduplication for Efficient Cloud Storage and Retrieval
Misal, Rishikesh
Perumal, Boominathan
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927
[26] Improving Data Availability for Deduplication in Cloud Storage
Li, Jun
Hou, Mengshu
INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (02) : 70 - 89
[27] Analysis of Energy Consumption of Deduplication in Storage Systems
Yan, Yizhou
Wu, Wenjun
2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 295 - 301
[28] RESEARCH OF NETWORK STORAGE BASED ON DATA DEDUPLICATION
Zhang, Wei
Wang, Huajun
Lu, Hanyu
Huang, Wei
2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 3, 2012, : 555 - 559
[29] Deduplication scheme with data popularity for cloud storage
He X.
Yang Q.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2024, 51 (01): : 187 - 200
[30] A survey on novel classification of deduplication storage systems
Mohamed, Shawgi M. A.
Wang, Yongli
DISTRIBUTED AND PARALLEL DATABASES, 2021, 39 (01) : 201 - 230

← 1 2 3 4 5 →