A Randomly Accessible Lossless Compression Scheme for Time-Series Data

被引:0
|
作者
Vestergaard, Rasmus [1 ]
Lucani, Daniel E.
Zhang, Qi
机构
[1] Aarhus Univ, DIGIT, Aarhus, Denmark
关键词
D O I
10.1109/infocom41043.2020.9155450
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We detail a practical compression scheme for lossless compression of time-series data, based on the emerging concept of generalized deduplication. As data is no longer stored for just archival purposes, but needs to be continuously accessed in many applications, the scheme is designed for low-cost random access to its compressed data, avoiding decompression. With this method, an arbitrary bit of the original data can be read by accessing only a few hundred bits in the worst case, several orders of magnitude fewer than state-of-the-art compression schemes. Subsequent retrieval of bits requires visiting at most a few tens of bits. A comprehensive evaluation of the compressor on eight real-life data sets from various domains is provided. The cost of this random access capability is a loss in compression ratio compared with the state-of-the-art compression schemes BZIP2 and 7z, which can be as low as 5% depending on the data set. Compared to GZIP, the proposed scheme has a better compression ratio for most of the data sets. Our method has massive potential for applications requiring frequent random accesses, as the only existing approach with comparable random access cost is to store the data without compression.
引用
收藏
页码:2145 / 2154
页数:10
相关论文
共 50 条
  • [31] Clustering of multivariate time-series data
    Singhal, A
    Seborg, DE
    PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 3931 - 3936
  • [32] Spectral analysis of time-series data
    Gregson, RAM
    CONTEMPORARY PSYCHOLOGY-APA REVIEW OF BOOKS, 1999, 44 (04): : 306 - 309
  • [33] MEASURING INSTABILITY OF TIME-SERIES DATA
    CUDDY, JDA
    DELLAVALLE, PA
    OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 1978, 40 (01) : 79 - 85
  • [34] MEASURING THE INSTABILITY OF TIME-SERIES DATA
    DUGGAN, JE
    OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 1979, 41 (03) : 239 - 246
  • [35] Techniques to Compress Time-Series Data
    Iqbal, Owais
    Keskar, Dr R. B.
    2021 10TH INTERNATIONAL CONFERENCE ON POWER SCIENCE AND ENGINEERING (ICPSE 2021), 2021, : 56 - 60
  • [36] TIME-SERIES ANALYSIS OF BIOLOGICAL DATA
    NICHOLLS, DF
    BIOMETRICS, 1979, 35 (03) : 698 - 698
  • [37] Time-series data and the "migraine generator"
    Fox, AW
    HEADACHE, 2005, 45 (07): : 920 - 925
  • [38] TIME-SERIES ANALYSIS OF CIRCULAR DATA
    FISHER, NI
    LEE, AJ
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1994, 56 (02): : 327 - 339
  • [39] TIME-SERIES ANALYSIS OF FAILURE DATA
    SINGPURWALLA, ND
    PROCEEDINGS ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 1978, (NSYM): : 107 - 112
  • [40] Clustering multivariate time-series data
    Singhal, A
    Seborg, DE
    JOURNAL OF CHEMOMETRICS, 2005, 19 (08) : 427 - 438