A Learned Approach to Design Compressed Rank/Select Data Structures

被引:9
|
作者
Boffa, Antonio [1 ]
Ferragina, Paolo [1 ]
Vinciguerra, Giorgio [1 ]
机构
[1] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy
关键词
Compressed data structures; rank/select dictionaries; piecewise linear approximations; high order entropy; algorithm engineering; RANK; REPRESENTATION; RETRIEVAL; STORAGE;
D O I
10.1145/3524060
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We address the problem of designing, implementing, and experimenting with compressed data structures that support rank and select queries over a dictionary of integers. We shine a new light on this classical problem by showing a connection between the input integers and the geometry of a set of points in a Cartesian plane suitably derived from them. We then build upon some results in computational geometry to introduce the first compressed rank/select dictionary based on the idea of "learning" the distribution of such points via proper linear approximations (LA). We therefore call this novel data structure the la_vector. We prove time and space complexities of the la_vector in several scenarios: in the worst case, in the case of input distributions with finite mean and variance, and taking into account the kth order entropy of some of its building blocks. We also discuss improved hybrid data structures, namely, ones that suitably orchestrate known compressed rank/select dictionaries with the la_vector. We corroborate our theoretical results with a large set of experiments over datasets originating from a variety of applications (Web search, DNAsequencing, information retrieval, and natural language processing) and show that our approach provides new interesting space-time tradeoffs with respect to many well-established compressed rank/select dictionary implementations. In particular, we show that our select is the fastest, and our rank is on the space-time Pareto frontier.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] Compressed Dynamic Range Majority Data Structures
    Gagie, Travis
    He, Meng
    Navarro, Gonzalo
    2017 DATA COMPRESSION CONFERENCE (DCC), 2017, : 260 - 269
  • [32] Compressed data structures: Dictionaries and data-aware measures
    Gupta, Ankur
    Hon, Wing-Kai
    Shah, Rahul
    Vitter, Jeffrey Scott
    THEORETICAL COMPUTER SCIENCE, 2007, 387 (03) : 313 - 331
  • [33] Compressed data structures: Dictionaries and data-aware measures
    Gupta, Ankur
    Hon, Wing-Kai
    Shah, Rahul
    Vitter, Jeffrey Scott
    DCC 2006: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2006, : 213 - +
  • [34] Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space
    Kempa, Dominik
    Kociumaka, Tomasz
    2023 IEEE 64TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, FOCS, 2023, : 1877 - 1886
  • [35] A Simple Approach to Jointly Rank Passages and Select Relevant Sentences in the OBQA Context
    Ferguson, Alex
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 181 - 187
  • [36] Compressed Dynamic Range Majority and Minority Data Structures
    Gagie, Travis
    He, Meng
    Navarro, Gonzalo
    ALGORITHMICA, 2020, 82 (07) : 2063 - 2086
  • [37] Algorithms and data structures for compressed-memory machines
    Franaszek, PA
    Heidelberger, P
    Poff, DE
    Robinson, JT
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2001, 45 (02) : 245 - 258
  • [38] Compressed Dynamic Range Majority and Minority Data Structures
    Travis Gagie
    Meng He
    Gonzalo Navarro
    Algorithmica, 2020, 82 : 2063 - 2086
  • [39] An efficient data mining approach on compressed transactions
    Dai, Jia-Yu
    Yang, Don-Lin
    Wu, Jungpin
    Hung, Ming-Chuan
    World Academy of Science, Engineering and Technology, 2009, 40 : 522 - 529
  • [40] Formal Verification of the rank Algorithm for Succinct Data Structures
    Tanaka, Akira
    Affeldt, Reynald
    Garrigue, Jacques
    FORMAL METHODS AND SOFTWARE ENGINEERING, ICFEM 2016, 2016, 10009 : 243 - 260