Dictionary-based order-preserving string compression

被引:17
|
作者
Antoshenkov G. [1 ]
机构
[1] Oracle Corporation, New England Development Center, Nashua, NH 03062
关键词
Indexing; Order-preserving key compression;
D O I
10.1007/s007780050031
中图分类号
学科分类号
摘要
As no database exists without indexes, no index implementation exists without order-preserving key compression, in particular, without prefix and tail compression. However, despite the great potentials of making indexes smaller and faster, application of general compression methods to ordered data sets has advanced very little. This paper demonstrates that the fast dictionary-based methods can be applied to order-preserving compression almost with the same freedom as in the general case. The proposed new technology has the same speed and a compression rate only marginally lower than the traditional order-indifferent dictionary encoding. Procedures for encoding and generating the encode tables are described covering such order-related features as ordered data set restrictions, sensitivity and insensitivity to a character position, and one-symbol encoding of each frequent trailing character sequence. The experimental results presented demonstrate five-folded compression on real-life data sets and twelve-folded compression on Wisconsin benchmark text fields.
引用
收藏
页码:26 / 39
页数:13
相关论文
共 50 条
  • [1] Dictionary-based Order-preserving String Compression for Main Memory Column Stores
    Binnig, Carsten
    Hildenbrand, Stefan
    Faerber, Franz
    [J]. ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 283 - 295
  • [2] String Periods in the Order-Preserving Model
    Gourdel, Garance
    Kociumaka, Tomasz
    Radoszewski, Jakub
    Rytter, Wojciech
    Shur, Arseny
    Walen, Tomasz
    [J]. 35TH SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2018), 2018, 96
  • [3] String periods in the order-preserving model
    Gourdel, Garance
    Kociumaka, Tomasz
    Radoszewski, Jakub
    Rytter, Wojciech
    Shur, Arseny
    Walen, Tomasz
    [J]. INFORMATION AND COMPUTATION, 2020, 270
  • [4] Offline dictionary-based compression
    Larsson, NJ
    Moffat, A
    [J]. DCC '99 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1999, : 296 - 305
  • [5] Programmability in dictionary-based compression
    Heikkinen, Jari
    Takala, Janno
    [J]. 2006 INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP PROCEEDINGS, 2006, : 171 - +
  • [6] Revisiting dictionary-based compression
    Skibinski, P
    Grabowski, S
    Deorowicz, S
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2005, 35 (15): : 1455 - 1476
  • [7] Order preserving string compression
    Antoshenkov, G
    Lomet, D
    Murray, J
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 655 - 663
  • [8] THE STATISTICAL DICTIONARY-BASED STRING MATCHING PROBLEM
    Suri, M.
    Rini, S.
    [J]. IRAN WORKSHOP ON COMMUNICATION AND INFORMATION THEORY (IWCIT 2019), 2019,
  • [9] SE-Compression: A Generalization of Dictionary-Based Compression
    Popa, Ionut
    [J]. COMPUTER JOURNAL, 2011, 54 (11): : 1876 - 1881
  • [10] Order-Preserving 1-String Representations of Planar Graphs
    Biedl, Therese
    Derka, Martin
    [J]. SOFSEM 2017: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2017, 10139 : 283 - 294