Content-Based Chunk Placement Scheme for Decentralized Deduplication on Distributed File Systems

被引:0
|
作者
Kim, Keonwoo [1 ]
Kim, Jeehong [1 ]
Min, Changwoo [1 ]
Eom, Young Ik [1 ]
机构
[1] Sungkyunkwan Univ, Coll Informat & Commun Engn, Suwon, South Korea
关键词
Deduplication; Distributed file system; Chunk placement; Consistent hashing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth of data size causes several problems such as storage limitation and increment of data management cost. In order to store and manage massive data, Distributed File System (DFS) is widely used. Furthermore, in order to reduce the volume of storage, data deduplication schemes are being extensively studied. The data deduplication increases the available storage capacity by eliminating duplicated data. However, deduplication process causes performance overhead such as disk I/O. In this paper, we propose a content-based chunk placement scheme to increase deduplication rate on the DFS. To avoid performance overhead caused by deduplication process, we use lessfs in each chunk server. With our design, our system performs decentralized deduplication process in each chunk server. Moreover, we use consistent hashing for chunk allocation and failure recovery. Our experimental results show that the proposed system reduces the storage space by 60% than the system without consistent hashing.
引用
收藏
页码:173 / 183
页数:11
相关论文
共 50 条
  • [31] Rate-based admission control scheme for content-based publish/subscribe systems
    Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
    不详
    [J]. Ruan Jian Xue Bao, 2008, 9 (2191-2202): : 2191 - 2202
  • [32] Data File Layout Inference Using Content-Based Oracles
    Phillips, Reid A.
    Li, Wing-Ning
    Thompson, Craig
    Deneke, Wesley
    [J]. 2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 1029 - 1035
  • [33] On Improving the Accuracy and Performance of Content-Based File Type Identification
    Ahmed, Irfan
    Lhee, Kyung-suk
    Shin, Hyunjung
    Hong, ManPyo
    [J]. INFORMATION SECURITY AND PRIVACY, PROCEEDINGS, 2009, 5594 : 44 - +
  • [34] A hybrid content-based image authentication scheme
    Chen, Kai
    Zhu, Xinglei
    Zhang, Zhishou
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2007, 2007, 4810 : 226 - +
  • [35] A robust content-based copy detection scheme
    Wu, Ming-Ni
    Lin, Chia-Chen
    Chang, Chin-Chen
    [J]. FUNDAMENTA INFORMATICAE, 2006, 71 (2-3) : 351 - 366
  • [36] A Novel Content-based Information Hiding Scheme
    Kong, Jun
    Jia, Hongru
    Li, Xiaolu
    Qi, Zhi
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY, VOL I, PROCEEDINGS, 2009, : 436 - 440
  • [37] A Semantics based Routing Scheme for Content-Based Networking
    Zheng, Guide
    Chen, Ming
    [J]. MANUFACTURING SYSTEMS AND INDUSTRY APPLICATIONS, 2011, 267 : 821 - 826
  • [38] Content-based watermarking scheme for image authentication
    Yu, SS
    Hu, YP
    Zhou, JL
    [J]. 2004 8TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1-3, 2004, : 1083 - 1087
  • [39] Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing
    Hema, S.
    Kangaiammal, A.
    [J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES, ICCNCT 2019, 2020, 44 : 572 - 581
  • [40] Publisher Placement Algorithms in Content-based Publish/Subscribe
    King, Alex
    Cheung, Yeung
    Jacobsen, Hans-Arno
    [J]. 2010 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS ICDCS 2010, 2010,