Fine-grained Partitioning for Aggressive Data Skipping

被引:57
|
作者
Sun, Liwen [1 ]
Franklin, Michael J. [1 ]
Krishnan, Sanjay [1 ]
Xin, Reynold S. [2 ]
机构
[1] Univ Calif Berkeley, AMPLab, Berkeley, CA 94720 USA
[2] Databricks Inc, San Francisco, CA USA
关键词
Data warehouse; Partitioning; Query processing; Algorithms; SELECTION;
D O I
10.1145/2588555.2610515
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern query engines are increasingly being required to process enormous datasets in near real-time. While much can be done to speed up the data access, a promising technique is to reduce the need to access data through data skipping. By maintaining some metadata for each block of tuples, a query may skip a data block if the metadata indicates that the block does not contain relevant data. The effectiveness of data skipping, however, depends on how well the blocking scheme matches the query filters. In this paper, we propose a fine-grained blocking technique that reorganizes the data tuples into blocks with a goal of enabling queries to skip blocks aggressively. We first extract representative filters in a workload as features using frequent itemset mining. Based on these features, each data tuple can be represented as a feature vector. We then formulate the blocking problem as a optimization problem on the feature vectors, called Balanced MaxSkip Partitioning, which we prove is NP-hard. To find an approximate solution efficiently, we adopt the bottom-up clustering framework. We prototyped our blocking techniques on Shark, an open-source data warehouse system. Our experiments on TPC-H and a real-world workload show that our blocking technique leads to 2-5x improvement in query response time over traditional range-based blocking techniques.
引用
收藏
页码:1115 / 1126
页数:12
相关论文
共 50 条
  • [1] Fine-grained Data Partitioning Framework for Distributed Database Systems
    Xu, Ning
    Cui, Bin
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 57 - 61
  • [2] A Partitioning Framework for Aggressive Data Skipping
    Sun, Liwen
    Krishnan, Sanjay
    Xin, Reynold S.
    Franklin, Michael J.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1617 - 1620
  • [3] Fine-grained Program Partitioning for Security
    Huang, Zhen
    Jaeger, Trent
    Tan, Gang
    [J]. PROCEEDINGS OF THE 14TH EUROPEAN WORKSHOP ON SYSTEMS SECURITY (EUROSEC 2021), 2021, : 21 - 26
  • [4] Partitioning Techniques for Fine-grained Indexing
    Wu, Eugene
    Madden, Samuel
    [J]. IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1127 - 1138
  • [5] Partitioning of arsenic species in fine-grained soils
    Kuhlmeier, PD
    [J]. JOURNAL OF THE AIR & WASTE MANAGEMENT ASSOCIATION, 1997, 47 (04): : 481 - 490
  • [6] Aggressive Fine-Grained Power Gating of NoC Buffers
    Wu, Yibo
    Liu, Leibo
    Wang, Liang
    Wang, Xiaohang
    Han, Jie
    Deng, Chenchen
    Wei, Shaojun
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) : 3177 - 3189
  • [7] A Fine-Grained SDN Rule Table Partitioning and Distribution
    Yoshikawa, Yutaro
    Arai, Masayuki
    [J]. 2019 IEEE 24TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2019), 2019, : 89 - 90
  • [8] SCALABLE AND EFFICIENT FINE-GRAINED CACHE PARTITIONING WITH VANTAGE
    Sanchez, Daniel
    Kozyrakis, Christos
    [J]. IEEE MICRO, 2012, 32 (03) : 26 - 37
  • [9] Lookup Tables: Fine-Grained Partitioning for Distributed Databases
    Tatarowicz, Aubrey L.
    Curino, Carlo
    Jones, Evan P. C.
    Madden, Sam
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 102 - 113
  • [10] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587