Cluster-based delta compression of a collection of files

被引:0
|
作者
Ouyang, Z [1 ]
Memon, N [1 ]
Suel, T [1 ]
Trendafilov, D [1 ]
机构
[1] Polytech Univ, CIS Dept, Brooklyn, NY 11201 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.
引用
收藏
页码:257 / 266
页数:10
相关论文
共 50 条
  • [41] On Cluster-Based Channel Identification
    Wang, P.
    Ser, W.
    2012 INTERNATIONAL WORKSHOP ON INFORMATION AND ELECTRONICS ENGINEERING, 2012, 29 : 2699 - 2704
  • [42] Cluster-based network model
    Li, Hao
    Fernex, Daniel
    Semaan, Richard
    Tan, Jianguo
    Morzynski, Marek
    Noack, Bernd R.
    JOURNAL OF FLUID MECHANICS, 2021, 906 (906)
  • [43] Cluster-based patent retrieval
    Kang, In-Su
    Na, Seung-Hoon
    Kim, Jungi
    Lee, Jong-Hyeok
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (05) : 1173 - 1182
  • [44] Cluster-based virtual router
    Ge, JG
    Qian, HL
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : B102 - B109
  • [45] Cluster-Based Irresponsible Forwarding
    Busanelli, Stefano
    Ferrari, Gianluigi
    Panichpapiboon, Sooksan
    INTERNET OF THINGS-BOOK, 2010, : 59 - +
  • [46] A Modified LZW Algorithm Based on a Character String Parallel Search in Cluster-Based Telemetry Data Compression
    He, Yigen
    Shi, Xuesen
    Wang, Yongqing
    ELECTRONICS, 2022, 11 (17)
  • [47] Artificial Intelligence-Enabled Cooperative Cluster-Based Data Collection for Unmanned Aerial Vehicles
    Rajender, R.
    Anupama, C. S. S.
    Moses, G. Jose
    Lydia, E. Laxmi
    Kadry, Seifedine
    Lim, Sangsoon
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (02): : 3351 - 3365
  • [48] Loop acceleration by cluster-based CGRA
    Zhou, Li
    Liu, Hengzhu
    Zhang, Jianfeng
    IEICE ELECTRONICS EXPRESS, 2013, 10 (16):
  • [49] Cluster-based adaptive metric classification
    Giotis, Ioannis
    Petkov, Nicolai
    NEUROCOMPUTING, 2012, 81 : 33 - 40
  • [50] Cluster-based visualisation of marketing data
    Lisboa, PJG
    Patel, S
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 552 - 558