Cluster-based delta compression of a collection of files

被引:0
|
作者
Ouyang, Z [1 ]
Memon, N [1 ]
Suel, T [1 ]
Trendafilov, D [1 ]
机构
[1] Polytech Univ, CIS Dept, Brooklyn, NY 11201 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.
引用
收藏
页码:257 / 266
页数:10
相关论文
共 50 条
  • [31] Cluster-based cumulative ensembles
    Ayad, HG
    Kamel, MS
    MULTIPLE CLASSIFIER SYSTEMS, 2005, 3541 : 236 - 245
  • [32] Cluster-based ensemble of classifiers
    Rahman, Ashfaqur
    Verma, Brijesh
    EXPERT SYSTEMS, 2013, 30 (03) : 270 - 282
  • [33] Cluster-based holey semiconductors
    Huesing, Nicola
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2008, 47 (11) : 1992 - 1994
  • [34] Cluster-based learning in Herefordshire
    Saadi, Hasan
    EDUCATION FOR PRIMARY CARE, 2010, 21 (05) : 330 - 331
  • [35] Cluster-based outlier detection
    Duan, Lian
    Xu, Lida
    Liu, Ying
    Lee, Jun
    ANNALS OF OPERATIONS RESEARCH, 2009, 168 (01) : 151 - 168
  • [36] Cluster-based tangible programming
    Smith, Andrew Cyrus
    2014 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION AND COMMUNICATION TECHNOLOGY AND IT'S APPLICATIONS (DICTAP), 2014, : 405 - 410
  • [37] Cluster-based WDM network
    Li, Yiwu
    Li, Lemin
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 1998, 26 (04): : 94 - 97
  • [38] Cluster-Based Focused Retrieval
    Sheetrit, Eilon
    Kurland, Oren
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2305 - 2308
  • [39] A cluster-based factor rotation
    Yamamoto, Michio
    Jennrich, Robert I.
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2013, 66 (03): : 487 - 502
  • [40] Cluster-Based Distributed Consensus
    Li, Wenjun
    Dai, Huaiyu
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2009, 8 (01) : 28 - 31