Data deduplication with edit errors

被引:0
|
作者
Conde-Canencia, Laura [1 ]
Condie, Tyson [2 ]
Dolecek, Lara [3 ]
机构
[1] Univ Bretagne Sud, CNRS UMR 6285, Lab STICC, Lorient, France
[2] Univ Calif Los Angeles, CS Dept, Los Angeles, CA USA
[3] Univ Calif Los Angeles, ECE Dept, Los Angeles, CA USA
关键词
data deduplication; edit channel; insertions/deletions;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we tackle the problem of file deduplication for efficient data storage. We consider the case where the deduplication is performed on files that are modified by edit errors relative to the original version. We propose a novel block-level deduplication algorithm with variable-lengths in the case of non-binary alphabets. Compared to hash-based deduplication algorithms where file deduplication depends on the content of the hash keys or to brute force methods that compare files symbol-bysymbol, our algorithm significantly reduces the number of symbol comparisons and achieves high deduplication ratios. We present a theoretical analysis on the cost of the algorithm compared to naive methods and experimental results to evaluate the efficiency of our deduplication algorithm.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] DEDUP: Deduplication system for Encrypted Data in Cloud
    Kamboj, Himshai
    Sinha, Bharati
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 795 - 800
  • [42] Privacy in Cross-User Data Deduplication
    Hoda Jannati
    Ebrahim Ardeshir-Larijani
    Behnam Bahrak
    Mobile Networks and Applications, 2021, 26 : 2567 - 2579
  • [44] Data Deduplication in Wireless Multimedia Monitoring Network
    Yang, Yitao
    Qin, Xiaolin
    Sun, Guozi
    Xu, Yong
    Yang, Zhongxue
    Zu, Zhiyue
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2013,
  • [45] Data Deduplication for Efficient Cloud Storage and Retrieval
    Misal, Rishikesh
    Perumal, Boominathan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 922 - 927
  • [46] Data deduplication mechanism for cloud storage systems
    Xu, Xiaolong
    Tu, Qun
    2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 286 - 294
  • [47] Mixing Deduplication and Compression on Active Data Sets
    Constantinescu, Cornel
    Glider, Joseph
    Chambliss, David
    2011 DATA COMPRESSION CONFERENCE (DCC), 2011, : 393 - 402
  • [48] Deduplication in unstructured-data storage systems
    Tolic, Andrej
    Brodnik, Andrej
    ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2015, 82 (05): : 233 - 242
  • [49] Efficient cross user Data Deduplication in Remote Data Storage
    Prajapati, Priteshkumar
    Shah, Parth
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [50] Characterizing the Efficiency of Data Deduplication for Big Data Storage Management
    Zhou, Ruijin
    Liu, Ming
    Li, Tao
    2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 98 - 108