An Information-Theoretic Analysis of Deduplication

被引:0
|
作者
Niesen, Urs [1 ]
机构
[1] Qualcomm NJ Res Ctr, Bridgewater, NJ 08807 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deduplication finds and removes long-range data duplicates. It is commonly used in cloud and enterprise server settings and has been successfully applied to primary, backup, and archival storage. Despite its practical importance as a source-coding technique, its analysis from the point of view of information theory is missing. This paper provides such an information-theoretic analysis of data deduplication. It introduces a new source model adapted to the deduplication setting. It formalizes both fixed and variable-length deduplication schemes, and it introduces a novel, multi-chunk deduplication scheme. It then provides an analysis of these three deduplication variants, emphasizing the importance of boundary synchronization between source blocks and deduplication chunks. The proposed multi-chunk deduplication scheme is shown to be order optimal under fairly mild assumptions.
引用
下载
收藏
页码:1738 / 1742
页数:5
相关论文
共 50 条
  • [1] An Information-Theoretic Analysis of Deduplication
    Niesen, Urs
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (09) : 5688 - 5704
  • [2] Information-theoretic analysis of information hiding
    Moulin, P
    O'Sullivan, JA
    2000 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, 2000, : 19 - 19
  • [3] Information-theoretic analysis of information hiding
    Moulin, P
    O'Sullivan, JA
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2003, 49 (03) : 563 - 593
  • [4] Information-theoretic analysis of watermarking
    Moulin, P
    O'Sullivan, JA
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3630 - 3633
  • [5] Information-theoretic analysis of neural coding
    Johnson, DH
    Gruner, CM
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1937 - 1940
  • [6] Information-Theoretic Analysis of Neural Coding
    Don H. Johnson
    Charlotte M. Gruner
    Keith Baggerly
    Chandran Seshagiri
    Journal of Computational Neuroscience, 2001, 10 : 47 - 69
  • [7] Information-Theoretic Analysis of Haplotype Assembly
    Si, Hongbo
    Vikalo, Haris
    Vishwanath, Sriram
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (06) : 3468 - 3479
  • [8] Information-Theoretic Analysis of Spherical Fingerprinting
    Moulin, Pierre
    Wang, Ying
    2009 INFORMATION THEORY AND APPLICATIONS WORKSHOP, 2009, : 226 - +
  • [9] Information-theoretic analysis of neural coding
    Johnson, DH
    Gruner, CM
    Baggerly, K
    Seshagiri, C
    JOURNAL OF COMPUTATIONAL NEUROSCIENCE, 2001, 10 (01) : 47 - 69
  • [10] Information-theoretic analysis for transfer learning
    Wu, Xuetong
    Manton, Jonathan H.
    Aickelin, Uwe
    Zhu, Jingge
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2819 - 2824