An End-to-end High-performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems

被引:0
|
作者
Zhao, Nannan [1 ]
Lin, Muhui [2 ]
Albahar, Hadeel [3 ]
Paul, Arnab K. [4 ]
Huang, Zhijie [5 ]
Abraham, Subil [6 ]
Chen, Keren [7 ]
Tarasov, Vasily [8 ]
Skourtis, Dimitrios [8 ]
Anwar, Ali [9 ]
Butt, Ali R. [7 ]
机构
[1] Northwestern Polytech Univ Shenzhen, Res & Dev Inst, Xian, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Sabah Al Salem Univ City, Kuwait Univ, Kuwait, Kuwait
[4] BITS Pilani, KK Birla Goa Campus, Zuarinagar 403726, Goa, India
[5] Northwestern Polytech Univ, Xian 710129, Shaanxi, Peoples R China
[6] Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA
[7] Virginia Tech, Blacksburg, VA 24061 USA
[8] IBM Res Almaden, San Jose, CA 95120 USA
[9] Univ Minnesota, Twin Cities Campus, Minneapolis, MN 55455 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Docker registry; docker storage driver; linux file system; deduplication;
D O I
10.1145/3643819
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place a high pressure on the infrastructure of container registries that store and distribute images and container storage systems on the Docker client side that manage image layers and store ephemeral data generated at container runtime. The storage demand is worsened by the large amount of duplicate data in images. Moreover, container storage systems that use Copy-on-Write (CoW) file systems as storage drivers exacerbate the redundancy. Exploiting the high file redundancy in real-world images is a promising approach to drastically reduce the growing storage requirements of container registries and improve the space efficiency of container storage systems. However, existing deduplication techniques significantly degrade the performance of both registries and container storage systems because of data reconstruction overhead as well as the deduplication cost. We propose DupHunter, an end-to-end deduplication scheme that deduplicates layers for both Docker registries and container storage systems while maintaining a high image distribution speed and container I/O performance. DupHunter is divided into three tiers: registry tier, middle tier, and client tier. Specifically, we first build a high-performance deduplication engine at the registry tier that not only natively deduplicates layers for space savings but also reduces layer restore overhead. Then, we use deduplication offloading at the middle tier to eliminate the redundant files from the client tier and avoid bringing deduplication overhead to the clients. To further reduce the data duplicates caused by CoWs and improve the container I/O performance, we utilize a container-aware storage system at the client tier that reserves space for each container and arranges the placement of files and their modifications on the disk to preserve locality. Under real workloads, DupHunter reduces storage space by up to 6.9x and reduces the GET layer latency up to 2.8x compared to the state-of-the-art. Moreover, DupHunter can improve the container I/O performance by up to 93% for reads and 64% for writes.
引用
收藏
页数:35
相关论文
共 50 条
  • [1] DupHunter: Flexible High-Performance Deduplication for Docker Registries
    Zhao, Nannan
    Albahar, Hadeel
    Abraham, Subil
    Chen, Keren
    Tarasov, Vasily
    Skourtis, Dimitrios
    Rupprecht, Lukas
    Anwar, Ali
    Butt, Ali R.
    PROCEEDINGS OF THE 2020 USENIX ANNUAL TECHNICAL CONFERENCE, 2020, : 769 - 783
  • [2] High-performance docker integration scheme based on OpenStack
    Sijie Yang
    Xiaofeng Wang
    Xiaoxue Wang
    Lun An
    Guizhu Zhang
    World Wide Web, 2020, 23 : 2593 - 2632
  • [3] High-performance docker integration scheme based on OpenStack
    Yang, Sijie
    Wang, Xiaofeng
    Wang, Xiaoxue
    An, Lun
    Zhang, Guizhu
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (04): : 2593 - 2632
  • [4] End-to-end deep learning inference with CMSSW via ONNX using Docker
    Chaudhari, Purva
    Chaudhari, Shravan
    Chudasama, Ruchi
    Gleyzeron, Sergei
    26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS, CHEP 2023, 2024, 295
  • [5] A Framework for End-to-End Simulation of High-performance Computing Systems
    Denzel, Wolfgang E.
    Li, Jian
    Walker, Peter
    Jin, Yuho
    SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2010, 86 (5-6): : 331 - 350
  • [6] Large-Scale Analysis of Docker Images and Performance Implications for Container Storage Systems
    Zhao, Nannan
    Tarasov, Vasily
    Albahar, Hadeel
    Anwar, Ali
    Rupprecht, Lukas
    Skourtis, Dimitrios
    Paul, Arnab K.
    Chen, Keren
    Butt, Ali R.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (04) : 918 - 930
  • [7] SimEnc: A High-Performance Similarity-Preserving Encryption Approach for Deduplication of Encrypted Docker Images
    Sun, Tong
    Jiang, Bowen
    Li, Borui
    Lv, Jiamei
    Gao, Yi
    Dong, Wei
    PROCEEDINGS OF THE 2024 USENIX ANNUAL TECHNICAL CONFERENCE, ATC 2024, 2024, : 615 - 630
  • [8] WAP performance on an end-to-end scheme
    Ladas, C
    Edwards, RM
    Manson, G
    LONDON COMMUNICATIONS SYMPOSIUM 2001, PROCEEDINGS, 2001, : 183 - 186
  • [9] Rio_DSA: Redirecting I/O Scheme for Dynamic Storage Allocation on Docker Container
    Kwon, Sehoon
    No, Jaechun
    Park, Sung-soon
    UBIQUITOUS NETWORKING, UNET 2022, 2023, 13853 : 113 - 125
  • [10] A High-Performance Neural Network SoC for End-to-End Speaker Verification
    Tsai, Tsung-Han
    Chiang, Meng-Jui
    IEEE Access, 2024, 12 : 165482 - 165496