Data deduplication with edit errors

被引:0
|
作者
Conde-Canencia, Laura [1 ]
Condie, Tyson [2 ]
Dolecek, Lara [3 ]
机构
[1] Univ Bretagne Sud, CNRS UMR 6285, Lab STICC, Lorient, France
[2] Univ Calif Los Angeles, CS Dept, Los Angeles, CA USA
[3] Univ Calif Los Angeles, ECE Dept, Los Angeles, CA USA
关键词
data deduplication; edit channel; insertions/deletions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we tackle the problem of file deduplication for efficient data storage. We consider the case where the deduplication is performed on files that are modified by edit errors relative to the original version. We propose a novel block-level deduplication algorithm with variable-lengths in the case of non-binary alphabets. Compared to hash-based deduplication algorithms where file deduplication depends on the content of the hash keys or to brute force methods that compare files symbol-bysymbol, our algorithm significantly reduces the number of symbol comparisons and achieves high deduplication ratios. We present a theoretical analysis on the cost of the algorithm compared to naive methods and experimental results to evaluate the efficiency of our deduplication algorithm.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Secure Data Deduplication with Reliable Data Deletion in Cloud
    Meng, Wenjuan
    Ge, Jianhua
    Jiang, Tao
    INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2019, 30 (04) : 551 - 570
  • [32] Secure Encrypted Data Deduplication Based on Data Popularity
    He, Yunlong
    Xian, Hequn
    Wang, Liming
    Zhang, Shuguang
    MOBILE NETWORKS & APPLICATIONS, 2021, 26 (04): : 1686 - 1695
  • [33] Adversarial Edit Attacks for Tree Data
    Paassen, Benjamin
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 359 - 366
  • [34] Generation edit of data flow diagram
    Yao, Jun
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering & Electronics, 1993, 15 (03):
  • [35] DATA CONSULTING GROUPS DATA-ENTRY AND EDIT
    SKINNER, R
    DATA ENTRY AWARENESS REPORTS, 1986, 14 (06): : 10 - 20
  • [36] Secure and Efficient Data Deduplication in JointCloud Storage
    Zhang, Di
    Le, Junqing
    Mu, Nankun
    Wu, Jiahui
    Liao, Xiaofeng
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 156 - 167
  • [37] Similarity based deduplication with small data chunks
    Aronovich, L.
    Asher, R.
    Harnik, D.
    Hirsch, M.
    Klein, S. T.
    Toaff, Y.
    DISCRETE APPLIED MATHEMATICS, 2016, 212 : 10 - 22
  • [38] Trends in cleaning relational data: Consistency and deduplication
    Ilyas, Ihab F.
    Chu, Xu
    Foundations and Trends in Databases, 2015, 5 (04): : 281 - 393
  • [39] Improving Data Availability for Deduplication in Cloud Storage
    Li, Jun
    Hou, Mengshu
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (02) : 70 - 89
  • [40] A Survey and Comparative Study of Data Deduplication Techniques
    Malhotra, Jyoti
    Bakal, Jagdish
    2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,