A Learned Approach to Design Compressed Rank/Select Data Structures

被引:9
|
作者
Boffa, Antonio [1 ]
Ferragina, Paolo [1 ]
Vinciguerra, Giorgio [1 ]
机构
[1] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy
关键词
Compressed data structures; rank/select dictionaries; piecewise linear approximations; high order entropy; algorithm engineering; RANK; REPRESENTATION; RETRIEVAL; STORAGE;
D O I
10.1145/3524060
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We address the problem of designing, implementing, and experimenting with compressed data structures that support rank and select queries over a dictionary of integers. We shine a new light on this classical problem by showing a connection between the input integers and the geometry of a set of points in a Cartesian plane suitably derived from them. We then build upon some results in computational geometry to introduce the first compressed rank/select dictionary based on the idea of "learning" the distribution of such points via proper linear approximations (LA). We therefore call this novel data structure the la_vector. We prove time and space complexities of the la_vector in several scenarios: in the worst case, in the case of input distributions with finite mean and variance, and taking into account the kth order entropy of some of its building blocks. We also discuss improved hybrid data structures, namely, ones that suitably orchestrate known compressed rank/select dictionaries with the la_vector. We corroborate our theoretical results with a large set of experiments over datasets originating from a variety of applications (Web search, DNAsequencing, information retrieval, and natural language processing) and show that our approach provides new interesting space-time tradeoffs with respect to many well-established compressed rank/select dictionary implementations. In particular, we show that our select is the fastest, and our rank is on the space-time Pareto frontier.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Rank and Select for Succinct Data Structures
    Farina, Antonio
    Ladra, Susana
    Pedreira, Oscar
    Places, Angeles S.
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2009, 236 (0C) : 131 - 145
  • [2] A "Learned" Approach to Quicken and Compress Rank/Select Dictionaries
    Boffa, Antonio
    Ferragina, Paolo
    Vinciguerra, Giorgio
    2021 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2021, : 46 - 59
  • [3] A Hybrid Compressed Data Structure Supporting Rank and Select on Bit Sequences
    Arroyuelo, Diego
    Weitzman, Manuel
    2020 39TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2020,
  • [4] Rank and select: Another lesson learned
    Grabowski, Szymon
    Raniszewski, Marcin
    INFORMATION SYSTEMS, 2018, 73 : 25 - 34
  • [5] Grammar compressed sequences with rank/select support
    Ordóñez A.
    Navarro G.
    Brisaboa N.R.
    Navarro, Gonzalo (gnavarro@dcc.uchile.cl), 2017, Elsevier B.V., Netherlands (43) : 54 - 71
  • [6] Grammar Compressed Sequences with Rank/Select Support
    Navarro, Gonzalo
    Ordonez, Alberto
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 31 - 44
  • [7] Alphabet Partitioning for Compressed Rank/Select and Applications
    Barbay, Jeremy
    Gagie, Travis
    Navarro, Gonzalo
    Nekrich, Yakov
    ALGORITHMS AND COMPUTATION, PT 2, 2010, 6507 : 315 - +
  • [8] Rank/select on dynamic compressed sequences and applications
    Gonzalez, Rodrigo
    Navarro, Gonzalo
    THEORETICAL COMPUTER SCIENCE, 2009, 410 (43) : 4414 - 4422
  • [9] Run Compressed Rank/Select for Large Alphabets
    Fuentes-Sepulveda, Jose
    Karkkainen, Juha
    Kosolobov, Dmitry
    Puglisi, Simon J.
    2018 DATA COMPRESSION CONFERENCE (DCC 2018), 2018, : 315 - 324
  • [10] Access, Rank, and Select in Grammar-compressed Strings
    Belazzougui, Djamal
    Cording, Patrick Hagge
    Puglisi, Simon J.
    Tabei, Yasuo
    ALGORITHMS - ESA 2015, 2015, 9294 : 142 - 154