Cluster-based delta compression of a collection of files

被引:0
|
作者
Ouyang, Z [1 ]
Memon, N [1 ]
Suel, T [1 ]
Trendafilov, D [1 ]
机构
[1] Polytech Univ, CIS Dept, Brooklyn, NY 11201 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.
引用
收藏
页码:257 / 266
页数:10
相关论文
共 50 条
  • [21] Cluster-based two-branch framework for point cloud attribute compression
    Sun, Longhua
    Wang, Jin
    Zhu, Qing
    Liu, Jiaying
    Yu, Jiawen
    VISUAL COMPUTER, 2024, 40 (09): : 5947 - 5960
  • [22] XCluster: A cluster-based queriable multi-document XML compression method
    Zhao, Ming
    Luo, Jizhou
    Li, Jianzhong
    Gao, Hong
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2010, 47 (05): : 804 - 814
  • [23] Cluster-Based Arithmetic Coding for Data Provenance Compression in Wireless Sensor Networks
    Xu, Qinbao
    Akhtar, Rizwan
    Zhang, Xing
    Wang, Changda
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2018,
  • [24] Distributed Information Compression for Target Tracking in Cluster-Based Wireless Sensor Networks
    Liao, Shi-Kuan
    Lai, Kai-Jay
    Tsai, Hsiao-Ping
    Wen, Chih-Yu
    SENSORS, 2016, 16 (06)
  • [25] Streaming Compression Multimedia Data over WMSNs based on Fairness Cluster-based Routing Protocol
    Sarhadi, Bahar
    Abouei, Jamshid
    Hajiakhondi-Meybodi, Zohreh
    Mohammadi, Arash
    Plataniotis, Konstantinos N.
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 661 - 666
  • [26] Structural Compressed Network Coding for Data Collection in Cluster-Based Wireless Sensor Networks
    Zhao, Yimin
    Xiao, Song
    Gan, Hongping
    Li, Lizhao
    Xiao, Lina
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2019, E102B (11) : 2126 - 2138
  • [27] Cluster-Based Vehicular Data Collection for Efficient LTE Machine-Type Communication
    Ide, Christoph
    Kurtz, Fabian
    Wietfeld, Christian
    2013 IEEE 78TH VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2013,
  • [28] Cluster-Based Query Expansion
    Kalmanovich, Inna Gelfer
    Kurland, Oren
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 646 - 647
  • [29] Cluster-based outlier detection
    Lian Duan
    Lida Xu
    Ying Liu
    Jun Lee
    Annals of Operations Research, 2009, 168 : 151 - 168
  • [30] Cluster Integration for the Cluster-Based Instance Selection
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2010, 6421 : 353 - 362