Cluster-based delta compression of a collection of files

被引:0
|
作者
Ouyang, Z [1 ]
Memon, N [1 ]
Suel, T [1 ]
Trendafilov, D [1 ]
机构
[1] Polytech Univ, CIS Dept, Brooklyn, NY 11201 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.
引用
收藏
页码:257 / 266
页数:10
相关论文
共 50 条
  • [1] A cluster-based approach to compression of Quality Scores
    Hernaez, Mikel
    Ochoa, Idoia
    Weissman, Tsachy
    2016 DATA COMPRESSION CONFERENCE (DCC), 2016, : 261 - 270
  • [2] Incremental cluster-based retrieval using compressed cluster-skipping inverted files
    Altingovde, Ismail Sengor
    Demir, Engin
    Can, Fazli
    Ulusoy, Oezguer
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
  • [3] Covert Channel for Cluster-based File Systems Using Multiple Cover Files
    Morkevicius, Nerijus
    Petraitis, Grigas
    Venckauskas, Algimantas
    Ceponis, Jonas
    INFORMATION TECHNOLOGY AND CONTROL, 2013, 42 (03): : 260 - 267
  • [4] SOPHIA: An interactive cluster-based retrieval system for the OHSUMED collection
    Dobrynin, V
    Patterson, D
    Galushka, M
    Rooney, N
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2005, 9 (02): : 256 - 265
  • [5] Impact of Aggregation and Compression on Cluster-Based Wireless Sensor Networks
    Rivero-Angeles, Mario E.
    Orea-Flores, Izlian Y.
    COMPUTACION Y SISTEMAS, 2021, 25 (04): : 843 - 849
  • [6] Collaborative broadcasting and compression in cluster-based wireless sensor networks
    Hoang, AT
    Motani, M
    PROCEEDINGS OF THE SECOND EUROPEAN WORKSHOP ON WIRELESS SENSOR NETWORKS, 2005, : 197 - 206
  • [7] Exploring distributed and adaptive compression in cluster-based sensor routing
    Alqamzi, H
    Li, J
    2005 INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS, 2005, : 530 - 535
  • [8] Vertical and Horizontal Compression Scheme Assessment in Cluster-Based WSNs
    El Aasri, Jihane
    Al Fallah, Samia
    Arioua, Mounir
    El Oualkadi, Ahmed
    Zekriti, Alia
    2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 650 - 655
  • [9] Collaborative broadcasting and compression in cluster-based wireless sensor networks
    Hoang, Anh Tuan
    Motani, Mehul
    ACM TRANSACTIONS ON SENSOR NETWORKS, 2007, 3 (03)
  • [10] Cluster-Based Structural Redundancy Identification for Neural Network Compression
    Wu, Tingting
    Song, Chunhe
    Zeng, Peng
    Xia, Changqing
    ENTROPY, 2023, 25 (01)