Data deduplication with edit errors

被引:0
|
作者
Conde-Canencia, Laura [1 ]
Condie, Tyson [2 ]
Dolecek, Lara [3 ]
机构
[1] Univ Bretagne Sud, CNRS UMR 6285, Lab STICC, Lorient, France
[2] Univ Calif Los Angeles, CS Dept, Los Angeles, CA USA
[3] Univ Calif Los Angeles, ECE Dept, Los Angeles, CA USA
关键词
data deduplication; edit channel; insertions/deletions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we tackle the problem of file deduplication for efficient data storage. We consider the case where the deduplication is performed on files that are modified by edit errors relative to the original version. We propose a novel block-level deduplication algorithm with variable-lengths in the case of non-binary alphabets. Compared to hash-based deduplication algorithms where file deduplication depends on the content of the hash keys or to brute force methods that compare files symbol-bysymbol, our algorithm significantly reduces the number of symbol comparisons and achieves high deduplication ratios. We present a theoretical analysis on the cost of the algorithm compared to naive methods and experimental results to evaluate the efficiency of our deduplication algorithm.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Hybrid Data Deduplication in Cloud Environment
    Fan, Chun-I
    Huang, Shi-Yuan
    Hsu, Wen-Che
    THIRD INTERNATIONAL CONFERENCE ON INFORMATION SECURITY AND INTELLIGENT CONTROL (ISIC 2012), 2012, : 174 - 177
  • [22] Secure Encrypted Data Deduplication Based on Data Popularity
    Yunlong He
    Hequn Xian
    Liming Wang
    Shuguang Zhang
    Mobile Networks and Applications, 2021, 26 : 1686 - 1695
  • [23] Design of an Exact Data Deduplication Cluster
    Kaiser, Juergen
    Meister, Dirk
    Brinkmann, Andre
    Effert, Sascha
    2012 IEEE 28TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2012,
  • [24] Droplet: a Distributed Solution of Data Deduplication
    Zhang, Yang
    Wu, Yongwei
    Yang, Guangwen
    2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, : 114 - 121
  • [25] A Comparative Study of Data Deduplication Strategies
    Chhabra, Nipun
    Bala, Manju
    2018 FIRST INTERNATIONAL CONFERENCE ON SECURE CYBER COMPUTING AND COMMUNICATIONS (ICSCCC 2018), 2018, : 68 - 72
  • [26] Brushing-An Algorithm for Data Deduplication
    Dutta, Prasun
    Pattnaik, Pratik
    Sahu, Rajesh Kumar
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, INDIA 2016, 2016, 433 : 227 - 234
  • [27] Constructions and decoding of GC-balanced codes for edit errors
    Wu, Kenan
    Liu, Shu
    FINITE FIELDS AND THEIR APPLICATIONS, 2024, 95
  • [28] A Method of Deduplication for Data Remote Backup
    Liu, Jingyu
    Tan, Yu-an
    Li, Yuanzhang
    Zhang, Xuelan
    Zhou, Zexiang
    COMPUTER AND COMPUTING TECHNOLOGIES IN AGRICULTURE IV, PT 1, 2011, 344 : 68 - +
  • [29] Data Deduplication in Wireless sensor network
    Jasmin, M.
    Philomina, S.
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, : 275 - 285
  • [30] Data Deduplication in Cloud Computing Systems
    Shang, Yingdan
    Li, Huiba
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 483 - 486