BF-Tree: Approximate Tree Indexing

被引：43

作者：

Athanassoulis, Manos ^{[1
]}

Ailamaki, Anastasia ^{[1
]}

机构：

[1] Ecole Polytech Fed Lausanne, Lausanne, VD, Switzerland

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2014年 / 7卷 / 14期

关键词：

D O I：

10.14778/2733085.2733094

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The increasing volume of time-based generated data and the shift in storage technologies suggest that we might need to reconsider indexing. Several workloads - like social and service monitoring - often include attributes with implicit clustering because of their time-dependent nature. In addition, solid state disks (SSD) (using flash or other low-level technologies) emerge as viable competitors of hard disk drives (HDD). Capacity and access times of storage devices create a trade-off between SSD and HDD. Slow random accesses in HDD have been replaced by efficient random accesses in SSD, but their available capacity is one or more orders of magnitude more expensive than the one of HDD. Indexing, however, is designed assuming HDD as secondary storage, thus minimizing random accesses at the expense of capacity. Indexing data using SSD as secondary storage requires treating capacity as a scarce resource. To this end, we introduce approximate tree indexing, which employs probabilistic data structures (Bloom filters) to trade accuracy for size and produce smaller, yet powerful, tree indexes, which we name Bloom filter trees (BF-Trees). BF-Trees exploit pre-existing data ordering or partitioning to offer competitive search performance. We demonstrate, both by an analytical study and by experimental results, that by using workload knowledge and reducing indexing accuracy up to some extent, we can save substantially on capacity when indexing on ordered or partitioned attributes. In particular, in experiments with a synthetic workload, approximate indexing offers 2.22x-48x smaller index footprint with competitive response times, and in experiments with TPCH and a monitoring real-life dataset from an energy company, it offers 1.6x-4x smaller index footprint with competitive search times as well.

引用

页码：1881 / 1892

页数：12

共 50 条

[31] Approximate Cartesian Tree Pattern Matching
Kim, Sungmin
Han, Yo-Sub
DEVELOPMENTS IN LANGUAGE THEORY, DLT 2024, 2024, 14791 : 189 - 202
[32] A VLSI architecture for approximate tree matching
Sastry, R
Ranganathan, N
IEEE TRANSACTIONS ON COMPUTERS, 1998, 47 (03) : 346 - 352
[33] BUBBLE TREE FOR APPROXIMATE HARMONIC MAPS
Zhu, Xiangrong
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 2014, 142 (08) : 2849 - 2857
[34] On Fast and Approximate Attack Tree Computations
Juergenson, Aivo
Willemson, Jan
INFORMATION SECURITY PRACTICE AND EXPERIENCE, PROCEEDINGS, 2010, 6047 : 56 - +
[35] AN APPROXIMATE VERSION OF THE TREE PACKING CONJECTURE
Boettcher, Julia
Hladky, Jan
Piguet, Diana
Taraz, Anusch
ISRAEL JOURNAL OF MATHEMATICS, 2016, 211 (01) : 391 - 446
[36] Perfect KDB-tree:: A compact KDB-tree structure for indexing multidimensional data
Lin, HY
Huang, PW
THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2005, : 411 - 414
[37] Main memory indexing: The case for BD-tree
Cui, B
Ooi, BC
Su, JW
Tan, KL
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (07) : 870 - 874
[38] Fast mining of frequent tree structures by hashing and indexing
Katsaros, D
Nanopoulos, A
Manolopoulos, Y
INFORMATION AND SOFTWARE TECHNOLOGY, 2005, 47 (02) : 129 - 140
[39] Image Characteristics Indexing Based on X-Tree
Gao Ronghua
Wu Huarui
APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 3761 - 3764
[40] Optimised kd-tree indexing of multimedia data
Reiss, JD
Selbie, J
Sandler, MB
Digital Media: Processing Multimedia Interactive Services, 2003, : 47 - 52

← 1 2 3 4 5 →