Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

被引：51

作者：

Mao, Bo ^{[1
]}

Jiang, Hong ^{[2
]}

Wu, Suzhen ^{[1
]}

Fu, Yinjin ^{[3
]}

Tian, Lei ^{[2
]}

机构：

[1] Xiamen Univ, Xiamen 361005, Peoples R China

[2] Univ Nebraska, Lincoln, NE USA

[3] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China

来源：

ACM TRANSACTIONS ON STORAGE | 2014年 / 10卷 / 02期

基金：

美国国家科学基金会;

关键词：

Storage systems; data deduplication; virtual machine; solid-state drive; read performance; Design; Performance;

D O I：

10.1145/2512348

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM ( virtual machine) platforms. However, the performance of restore operations from a deduplicated backup can be significantly lower than that without deduplication. The main reason lies in the fact that a file or block is split into multiple small data chunks that are often located in different disks after deduplication, which can cause a subsequent read operation to invoke many disk IOs involving multiple disks and thus degrade the read performance significantly. While this problem has been by and large ignored in the literature thus far, we argue that the time is ripe for us to pay significant attention to it in light of the emerging cloud storage applications and the increasing popularity of the VM platform in the cloud. This is because, in a cloud storage or VM environment, a simple read request on the client side may translate into a restore operation if the data to be read or a VM suspended by the user was previously deduplicated when written to the cloud or the VM storage server, a likely scenario considering the network bandwidth and storage capacity concerns in such an environment. To address this problem, in this article, we propose SAR, an SSD (solid-state drive)-Assisted Read scheme, that effectively exploits the high random-read performance properties of SSDs and the unique data-sharing characteristic of deduplication-based storage systems by storing in SSDs the unique data chunks with high reference count, small size, and nonsequential characteristics. In this way, many read requests to HDDs are replaced by read requests to SSDs, thus significantly improving the read performance of the deduplicationbased storage systems in the cloud. The extensive trace-driven and VM restore evaluations on the prototype implementation of SAR show that SAR outperforms the traditional deduplication-based and flash-based cache schemes significantly, in terms of the average response times.

引用

页数：22

共 50 条

[41] Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage
Pugazhendi, E.
Sumalatha, M. R.
Harika, Lakshmi P.
[J]. JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2021, 80 (03): : 260 - 269
[42] A Data Deduplication Method in the Cloud Storage Based on FP-tree
Wan Haoran
Tong Weiqin
Gao Qiang
Zheng Shengan
[J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 557 - 562
[43] Cloud Based Storage System using Secure Deduplication and File Compression
Sukruti, Gajare B.
Rubeena, Khan A.
[J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
[44] DASM: A Dynamic Adaptive Forward Assembly Area Method to Accelerate Restore Speed for Deduplication-Based Backup Systems
Tan, Chao
Li, Luyu
Wu, Chentao
Li, Jie
[J]. NETWORK AND PARALLEL COMPUTING, 2016, 9966 : 58 - 70
[45] Read latency variation aware performance optimization on high-density NAND flash based storage systems
Liang Shi
Yina Lv
Longfei Luo
Changlong Li
Chun Jason Xue
Edwin H.-M. Sha
[J]. CCF Transactions on High Performance Computing, 2022, 4 : 265 - 280
[46] Read latency variation aware performance optimization on high-density NAND flash based storage systems
Shi, Liang
Lv, Yina
Luo, Longfei
Li, Changlong
Xue, Chun Jason
Sha, Edwin H-M
[J]. CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2022, 4 (03) : 265 - 280
[47] Research on cloud storage biological data deduplication method based on Simhash algorithm
Du, Haijuan
[J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2023, 27 (04) : 252 - 266
[48] Blockchain-Based Deduplication and Integrity Auditing Over Encrypted Cloud Storage
Song, Mingyang
Hua, Zhongyun
Zheng, Yifeng
Huang, Hejiao
Jia, Xiaohua
[J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (06) : 4928 - 4945
[49] Deep CNN based online image deduplication technique for cloud storage system
Ravneet Kaur
Jhilik Bhattacharya
Inderveer Chana
[J]. Multimedia Tools and Applications, 2022, 81 : 40793 - 40826
[50] Deep CNN based online image deduplication technique for cloud storage system
Kaur, Ravneet
Bhattacharya, Jhilik
Chana, Inderveer
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40793 - 40826

← 1 2 3 4 5 →