Content-Based Chunk Placement Scheme for Decentralized Deduplication on Distributed File Systems

被引:0
|
作者
Kim, Keonwoo [1 ]
Kim, Jeehong [1 ]
Min, Changwoo [1 ]
Eom, Young Ik [1 ]
机构
[1] Sungkyunkwan Univ, Coll Informat & Commun Engn, Suwon, South Korea
关键词
Deduplication; Distributed file system; Chunk placement; Consistent hashing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth of data size causes several problems such as storage limitation and increment of data management cost. In order to store and manage massive data, Distributed File System (DFS) is widely used. Furthermore, in order to reduce the volume of storage, data deduplication schemes are being extensively studied. The data deduplication increases the available storage capacity by eliminating duplicated data. However, deduplication process causes performance overhead such as disk I/O. In this paper, we propose a content-based chunk placement scheme to increase deduplication rate on the DFS. To avoid performance overhead caused by deduplication process, we use lessfs in each chunk server. With our design, our system performs decentralized deduplication process in each chunk server. Moreover, we use consistent hashing for chunk allocation and failure recovery. Our experimental results show that the proposed system reduces the storage space by 60% than the system without consistent hashing.
引用
收藏
页码:173 / 183
页数:11
相关论文
共 50 条
  • [1] Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup
    Bhagwat, Deepavali
    Eshghi, Kave
    Long, Darrell D. E.
    Lillibridge, Mark
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2009, : 237 - +
  • [2] Integrating content-based access mechanisms with hierarchical file systems
    Gopal, B
    Manber, U
    [J]. USENIX ASSOCIATION PROCEEDINGS OF THE THIRD SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '99), 1999, : 265 - 278
  • [3] Decentralized access control in distributed file systems
    Miltchev, Stefan
    Smith, Jonathan M.
    Prevelakis, Vassilis
    Keromytis, Angelos
    Ioannidis, Sotiris
    [J]. ACM COMPUTING SURVEYS, 2008, 40 (03)
  • [4] Cost Based Approach to Block Placement for Distributed File Systems
    Srinivasan, Lakshminarayanan
    Varma, Vasudeva
    [J]. 2014 INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD), 2014, : 132 - 138
  • [5] Content-based File Type Identification
    Bhat, Kireet
    Lam, Jason T.
    Zulkernine, Farhana
    [J]. 2018 10TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2018, : 277 - 280
  • [6] An effective scheme for content-based image retrieval systems
    Zhang, QY
    Lu, CC
    [J]. Vision '05: Proceedings of the 2005 International Conference on Computer Vision, 2005, : 255 - 261
  • [7] FILE PLACEMENT ON DISTRIBUTED COMPUTER-SYSTEMS
    WAH, BW
    [J]. COMPUTER, 1984, 17 (01) : 23 - 32
  • [8] Block Placement in Distributed File Systems Based on Block Access Frequency
    Liao, Jianwei
    Cai, Zhigang
    Trahay, Francois
    Peng, Xiaoning
    [J]. IEEE ACCESS, 2018, 6 : 38411 - 38420
  • [9] MUCH: Multithreaded Content-Based File Chunking
    Won, Youjip
    Lim, Kyeongyeol
    Min, Jaehong
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (05) : 1375 - 1388
  • [10] Hybrid deduplication system with content-based cache for cloud environment
    Godavari, Amdewar
    Sudhakar, Chapram
    Ramesh, T.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (05)