A Randomly Accessible Lossless Compression Scheme for Time-Series Data

被引:0
|
作者
Vestergaard, Rasmus [1 ]
Lucani, Daniel E.
Zhang, Qi
机构
[1] Aarhus Univ, DIGIT, Aarhus, Denmark
关键词
D O I
10.1109/infocom41043.2020.9155450
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We detail a practical compression scheme for lossless compression of time-series data, based on the emerging concept of generalized deduplication. As data is no longer stored for just archival purposes, but needs to be continuously accessed in many applications, the scheme is designed for low-cost random access to its compressed data, avoiding decompression. With this method, an arbitrary bit of the original data can be read by accessing only a few hundred bits in the worst case, several orders of magnitude fewer than state-of-the-art compression schemes. Subsequent retrieval of bits requires visiting at most a few tens of bits. A comprehensive evaluation of the compressor on eight real-life data sets from various domains is provided. The cost of this random access capability is a loss in compression ratio compared with the state-of-the-art compression schemes BZIP2 and 7z, which can be as low as 5% depending on the data set. Compared to GZIP, the proposed scheme has a better compression ratio for most of the data sets. Our method has massive potential for applications requiring frequent random accesses, as the only existing approach with comparable random access cost is to store the data without compression.
引用
收藏
页码:2145 / 2154
页数:10
相关论文
共 50 条
  • [41] DIRECTIONAL CORRELATION IN TIME-SERIES DATA
    STRAHAN, RR
    PSYCHOPHYSIOLOGY, 1970, 6 (05) : 652 - &
  • [42] DETECTING OUTLIERS IN TIME-SERIES DATA
    CHERNICK, MR
    DOWNING, DJ
    PIKE, DH
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1982, 77 (380) : 743 - 747
  • [43] FRACTAL MODELING OF TIME-SERIES DATA
    MAZEL, DS
    HAYES, MH
    TWENTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2: CONFERENCE RECORD, 1989, : 182 - 186
  • [44] Locating motifs in time-series data
    Liu, Z
    Yu, JX
    Lin, XM
    Lu, HJ
    Wang, W
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 343 - 353
  • [45] A time-series database for environmental data
    Halliburton, GA
    ENVIRONMENTAL SOFTWARE SYSTEMS, VOL 2, 1997, : 205 - 208
  • [46] The analysis of chaotic time-series data
    Kostelich, EJ
    SYSTEMS & CONTROL LETTERS, 1997, 31 (05) : 313 - 319
  • [47] Neural Decomposition of Time-Series Data
    Godfrey, Luke B.
    Gashler, Michael S.
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 2796 - 2801
  • [48] A TIME-SERIES ANALYSIS OF BINARY DATA
    KEENAN, DM
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1982, 77 (380) : 816 - 821
  • [49] TIME-SERIES COUNT DATA REGRESSION
    BRANNAS, K
    JOHANSSON, P
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1994, 23 (10) : 2907 - 2925
  • [50] The analysis of chaotic time-series data
    Department of Mathematics, Arizona State University, Tempe, AZ 85287, United States
    Syst Control Lett, 5 (313-319):