Data deduplication with edit errors

被引:0
|
作者
Conde-Canencia, Laura [1 ]
Condie, Tyson [2 ]
Dolecek, Lara [3 ]
机构
[1] Univ Bretagne Sud, CNRS UMR 6285, Lab STICC, Lorient, France
[2] Univ Calif Los Angeles, CS Dept, Los Angeles, CA USA
[3] Univ Calif Los Angeles, ECE Dept, Los Angeles, CA USA
关键词
data deduplication; edit channel; insertions/deletions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we tackle the problem of file deduplication for efficient data storage. We consider the case where the deduplication is performed on files that are modified by edit errors relative to the original version. We propose a novel block-level deduplication algorithm with variable-lengths in the case of non-binary alphabets. Compared to hash-based deduplication algorithms where file deduplication depends on the content of the hash keys or to brute force methods that compare files symbol-bysymbol, our algorithm significantly reduces the number of symbol comparisons and achieves high deduplication ratios. We present a theoretical analysis on the cost of the algorithm compared to naive methods and experimental results to evaluate the efficiency of our deduplication algorithm.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Distributed Data Deduplication
    Chu, Xu
    Ilyas, Ihab F.
    Koutris, Paraschos
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (11): : 864 - 875
  • [2] Data Deduplication with Random Substitutions
    Lou, Hao
    Farnoud, Farzad
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2377 - 2382
  • [3] PerfectDedup: Secure Data Deduplication
    Puzio, Pasquale
    Molva, Refik
    Onen, Melek
    Loureiro, Sergio
    DATA PRIVACY MANAGEMENT, AND SECURITY ASSURANCE, 2016, 9481 : 150 - 166
  • [4] A Global Survey on Data Deduplication
    Singhal, Shubhanshi
    Sharma, Pooja
    Aggarwal, Rajesh Kumar
    Passricha, Vishal
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (04) : 43 - 66
  • [5] An Overview on Data Deduplication Techniques
    Zhang, Xuecheng
    Deng, Mingzhu
    INFORMATION TECHNOLOGY AND INTELLIGENT TRANSPORTATION SYSTEMS, VOL 2, 2017, 455 : 359 - 369
  • [6] Transparent Data Deduplication in the Cloud
    Armknecht, Frederik
    Bohli, Jens-Matthias
    Karame, Ghassan O.
    Youssef, Franck
    CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, : 886 - 900
  • [7] Data Deduplication With Random Substitutions
    Lou, Hao
    Farnoud, Farzad
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (10) : 6941 - 6963
  • [8] Data Deduplication based on Hadoop
    Zhang, Dongzhan
    Liao, Chengfa
    Yan, Wenjing
    Tao, Ran
    Zheng, Wei
    2017 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2017, : 147 - 152
  • [9] Sensitive Cloud Data Deduplication with Data Dynamics
    Wang, Yan
    Liu, Ying
    Li, Chaoling
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 6236 - 6240
  • [10] Probabilistic data generation for deduplication and data linkage
    Christen, P
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 109 - 116