BF-Tree: Approximate Tree Indexing

被引:43
|
作者
Athanassoulis, Manos [1 ]
Ailamaki, Anastasia [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, VD, Switzerland
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2014年 / 7卷 / 14期
关键词
D O I
10.14778/2733085.2733094
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing volume of time-based generated data and the shift in storage technologies suggest that we might need to reconsider indexing. Several workloads - like social and service monitoring - often include attributes with implicit clustering because of their time-dependent nature. In addition, solid state disks (SSD) (using flash or other low-level technologies) emerge as viable competitors of hard disk drives (HDD). Capacity and access times of storage devices create a trade-off between SSD and HDD. Slow random accesses in HDD have been replaced by efficient random accesses in SSD, but their available capacity is one or more orders of magnitude more expensive than the one of HDD. Indexing, however, is designed assuming HDD as secondary storage, thus minimizing random accesses at the expense of capacity. Indexing data using SSD as secondary storage requires treating capacity as a scarce resource. To this end, we introduce approximate tree indexing, which employs probabilistic data structures (Bloom filters) to trade accuracy for size and produce smaller, yet powerful, tree indexes, which we name Bloom filter trees (BF-Trees). BF-Trees exploit pre-existing data ordering or partitioning to offer competitive search performance. We demonstrate, both by an analytical study and by experimental results, that by using workload knowledge and reducing indexing accuracy up to some extent, we can save substantially on capacity when indexing on ordered or partitioned attributes. In particular, in experiments with a synthetic workload, approximate indexing offers 2.22x-48x smaller index footprint with competitive response times, and in experiments with TPCH and a monitoring real-life dataset from an energy company, it offers 1.6x-4x smaller index footprint with competitive search times as well.
引用
收藏
页码:1881 / 1892
页数:12
相关论文
共 50 条
  • [1] Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range Index
    Hao, Xiangpeng
    Chandramouli, Badrish
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 3442 - 3455
  • [2] Tree Indexing on Flash Disks
    Li, Yinan
    He, Bingsheng
    Luo, Qiong
    Yi, Ke
    ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1303 - 1306
  • [3] Approximate tree kernels
    Rieck, Konrad
    Krueger, Tammo
    Brefeld, Ulf
    Müller, Klaus-Robert
    Journal of Machine Learning Research, 2010, 11 : 555 - 580
  • [4] Approximate Tree Kernels
    Rieck, Konrad
    Krueger, Tammo
    Brefeld, Ulf
    Mueller, Klaus-Robert
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 555 - 580
  • [5] On the sequencing of tree structures for XML indexing
    Wang, HX
    Meng, XF
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 372 - 383
  • [6] Random indexing K-tree
    De Vries, Christopher M.
    De Vine, Lance
    Geva, Shlomo
    ADCS 2009 - Proceedings of the Fourteenth Australasian Document Computing Symposium, 2009, : 43 - 50
  • [7] Tree Indexing on Solid State Drives
    Li, Yinan
    He, Bingsheng
    Yang, Robin Jun
    Luo, Qiong
    Yi, Ke
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 1195 - 1206
  • [8] GLASSHOUSE INDEXING FOR FRUIT TREE VIRUSES
    FRIDLUND, PR
    ACTA PHYTOPATHOLOGICA ACADEMIAE SCIENTIARUM HUNGARICAE, 1980, 15 (1-4): : 153 - 158
  • [9] Similarity indexing with the SS-tree
    White, DA
    Jain, R
    PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 516 - 523
  • [10] APPROXIMATE GROUP CONTEXT TREE
    Belloni, Alexandre
    Oliveira, Roberto I.
    ANNALS OF STATISTICS, 2017, 45 (01): : 355 - 385