Dictionary-based fast transform for text compression

被引:7
|
作者
Sun, WF [1 ]
Zhang, N [1 ]
Mukherjee, A [1 ]
机构
[1] Univ Cent Florida, Sch Elect Engn & Comp Sci, Orlando, FL 32816 USA
关键词
D O I
10.1109/ITCC.2003.1197522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we present StarNT a dictionary-based fast lossless text transform algorithm. With a static generic dictionary, StarNT achieves a superior compression ratio than almost all the other recent efforts based on BWT and PPM. This algorithm utilizes ternary search tree to expedite transform encoding. Experimental results show that the average compression time has improved by orders of magnitude compared with our previous algorithm LIPT and the additional time overhead it introduced to the backend compressor is unnoticeable. Based on StarNT we propose StarZip, a domain-specific lossless text compression utility. Using domain-specific static dictionaries embedded in the system, StarZip achieves an average improvement in compression performance (in terms of BPC) of 13% over bzip2 -9, 19% over gzip -9, and 10% over PPMD.
引用
收藏
页码:176 / 182
页数:7
相关论文
共 50 条
  • [1] A fast decoding algorithm for dictionary-based text compression system
    Wong, CH
    Cheng, LM
    Ng, KS
    [J]. INTERNATIONAL SOCIETY FOR COMPUTERS AND THEIR APPLICATIONS 11TH INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 1998, : 63 - 66
  • [2] Fast Dictionary-Based Compression for Inverted Indexes
    Pibiri, Giulio Ermanno
    Petri, Matthias
    Moffat, Alistair
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 6 - 14
  • [3] A dictionary-based multi-corpora text compression system
    Sun, WF
    Zhang, N
    Mukherjee, A
    [J]. DCC 2003: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2003, : 448 - 448
  • [4] A dictionary-based text compression technique using quaternary code
    Ahsan Habib
    M. Jahirul Islam
    Mohammad Shahidur Rahman
    [J]. Iran Journal of Computer Science, 2020, 3 (3) : 127 - 136
  • [5] Note on the greedy parsing optimality for dictionary-based text compression
    Crochemore, Maxime
    Langiu, Alessio
    Mignosi, Filippo
    [J]. THEORETICAL COMPUTER SCIENCE, 2014, 525 : 55 - 59
  • [6] Two-Level Dictionary-Based Text Compression Scheme
    Zia, Md. Ziaul Karim
    Rahman, Dewan Md. Fayzur
    Rahman, Chowdhury Mofizur
    [J]. 2008 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY: ICCIT 2008, VOLS 1 AND 2, 2008, : 569 - 574
  • [7] ETAOSD: Static Dictionary-Based Transformation Method for Text Compression
    Baloul, Fadlelmoula Mohamed
    Abdullah, Mohsin Hassan
    Babikir, Elsadig Ahmed
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONICS ENGINEERING (ICCEEE), 2013, : 384 - 389
  • [8] Dictionary-based English text compression using word endings
    Yang, Jeehong
    Savari, Serap A.
    [J]. DCC 2007: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2007, : 410 - 410
  • [9] Offline dictionary-based compression
    Larsson, NJ
    Moffat, A
    [J]. DCC '99 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1999, : 296 - 305
  • [10] Programmability in dictionary-based compression
    Heikkinen, Jari
    Takala, Janno
    [J]. 2006 INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP PROCEEDINGS, 2006, : 171 - +