DZip: improved general-purpose lossless compression based on novel neural network modeling

被引:19
|
作者
Goyal, Mohit [1 ]
Tatwawadi, Kedar [2 ]
Chandak, Shubham [2 ]
Ochoa, Idoia [1 ,3 ]
机构
[1] Univ Illinois, Elect & Comp Engn, Urbana, IL 61801 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[3] Univ Navarra, Dept Elect Engn, Pamplona, Spain
关键词
D O I
10.1109/DCC50243.2021.00023
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. DZip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN-based compressors, DZip does not require additional training data and is not restricted to specific data types. The proposed compressor outperforms general-purpose compressors such as Gzip (29% size reduction on average) and 7zip (12% size reduction on average) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. While the main limitation of NN-based compressors is generally the encoding/decoding speed, we empirically demonstrate that DZip achieves comparable compression ratio to other NN-based compressors while being several times faster. The source code for DZip and links to the datasets are available at https : //github . com/mohit1997/Dzip-torch.
引用
收藏
页码:153 / 162
页数:10
相关论文
共 50 条
  • [1] DZip: improved general-purpose lossless compression based on novel neural network modeling
    Goyal, Mohit
    Tatwawadi, Kedar
    Chandak, Shubham
    Ochoa, Idoia
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 372 - 372
  • [2] Accelerating General-Purpose Lossless Compression via Simple and Scalable Parameterization
    Mao, Yu
    Cui, Yufei
    Kuo, Tei-Wei
    Xue, Chun Jason
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3205 - 3213
  • [3] Towards General-Purpose Neural Network Computing
    Eldridge, Schuyler
    Appavoo, Jonathan
    Joshi, Ajay
    Waterland, Amos
    Seltzer, Margo
    2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 99 - 112
  • [4] A GENERAL-PURPOSE DIGITAL ARCHITECTURE FOR NEURAL NETWORK SIMULATIONS
    DURANTON, M
    MAUDUIT, N
    FIRST IEE INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1989, : 62 - 66
  • [5] TRACE: A Fast Transformer-based General-Purpose Lossless Compressor
    Mao, Yu
    Cui, Yufei
    Kuo, Tei-Wei
    Xue, Chun Jason
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 1829 - 1838
  • [6] A general-purpose compression scheme for databases
    Cannane, A
    Williams, HE
    Zobel, J
    DCC '99 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1999, : 519 - 519
  • [7] General-purpose compression for efficient retrieval
    Cannane, A
    Williams, HE
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2001, 52 (05): : 430 - 437
  • [8] General-Purpose Modeling Tool
    Rujevcic, Renato
    Penco, Roberto
    2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1289 - 1294
  • [9] General-purpose neural network mapping scheduling genetic algorithm
    Jisuanji Yanjiu yu Fazhan, 11 (872-876):
  • [10] A general-purpose neural network with on-chip BP learning
    Lu, C
    Shi, BX
    Chen, L
    2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 520 - 523