Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

被引:51
|
作者
Mao, Bo [1 ]
Jiang, Hong [2 ]
Wu, Suzhen [1 ]
Fu, Yinjin [3 ]
Tian, Lei [2 ]
机构
[1] Xiamen Univ, Xiamen 361005, Peoples R China
[2] Univ Nebraska, Lincoln, NE USA
[3] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China
基金
美国国家科学基金会;
关键词
Storage systems; data deduplication; virtual machine; solid-state drive; read performance; Design; Performance;
D O I
10.1145/2512348
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM ( virtual machine) platforms. However, the performance of restore operations from a deduplicated backup can be significantly lower than that without deduplication. The main reason lies in the fact that a file or block is split into multiple small data chunks that are often located in different disks after deduplication, which can cause a subsequent read operation to invoke many disk IOs involving multiple disks and thus degrade the read performance significantly. While this problem has been by and large ignored in the literature thus far, we argue that the time is ripe for us to pay significant attention to it in light of the emerging cloud storage applications and the increasing popularity of the VM platform in the cloud. This is because, in a cloud storage or VM environment, a simple read request on the client side may translate into a restore operation if the data to be read or a VM suspended by the user was previously deduplicated when written to the cloud or the VM storage server, a likely scenario considering the network bandwidth and storage capacity concerns in such an environment. To address this problem, in this article, we propose SAR, an SSD (solid-state drive)-Assisted Read scheme, that effectively exploits the high random-read performance properties of SSDs and the unique data-sharing characteristic of deduplication-based storage systems by storing in SSDs the unique data chunks with high reference count, small size, and nonsequential characteristics. In this way, many read requests to HDDs are replaced by read requests to SSDs, thus significantly improving the read performance of the deduplicationbased storage systems in the cloud. The extensive trace-driven and VM restore evaluations on the prototype implementation of SAR show that SAR outperforms the traditional deduplication-based and flash-based cache schemes significantly, in terms of the average response times.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage
    Pugazhendi, E.
    Sumalatha, M. R.
    Harika, Lakshmi P.
    [J]. JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2021, 80 (03): : 260 - 269
  • [42] A Data Deduplication Method in the Cloud Storage Based on FP-tree
    Wan Haoran
    Tong Weiqin
    Gao Qiang
    Zheng Shengan
    [J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 557 - 562
  • [43] Cloud Based Storage System using Secure Deduplication and File Compression
    Sukruti, Gajare B.
    Rubeena, Khan A.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [44] DASM: A Dynamic Adaptive Forward Assembly Area Method to Accelerate Restore Speed for Deduplication-Based Backup Systems
    Tan, Chao
    Li, Luyu
    Wu, Chentao
    Li, Jie
    [J]. NETWORK AND PARALLEL COMPUTING, 2016, 9966 : 58 - 70
  • [45] Read latency variation aware performance optimization on high-density NAND flash based storage systems
    Liang Shi
    Yina Lv
    Longfei Luo
    Changlong Li
    Chun Jason Xue
    Edwin H.-M. Sha
    [J]. CCF Transactions on High Performance Computing, 2022, 4 : 265 - 280
  • [46] Read latency variation aware performance optimization on high-density NAND flash based storage systems
    Shi, Liang
    Lv, Yina
    Luo, Longfei
    Li, Changlong
    Xue, Chun Jason
    Sha, Edwin H-M
    [J]. CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2022, 4 (03) : 265 - 280
  • [47] Research on cloud storage biological data deduplication method based on Simhash algorithm
    Du, Haijuan
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2023, 27 (04) : 252 - 266
  • [48] Blockchain-Based Deduplication and Integrity Auditing Over Encrypted Cloud Storage
    Song, Mingyang
    Hua, Zhongyun
    Zheng, Yifeng
    Huang, Hejiao
    Jia, Xiaohua
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (06) : 4928 - 4945
  • [49] Deep CNN based online image deduplication technique for cloud storage system
    Ravneet Kaur
    Jhilik Bhattacharya
    Inderveer Chana
    [J]. Multimedia Tools and Applications, 2022, 81 : 40793 - 40826
  • [50] Deep CNN based online image deduplication technique for cloud storage system
    Kaur, Ravneet
    Bhattacharya, Jhilik
    Chana, Inderveer
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40793 - 40826