Progressive Partitioning for Parallelized Query Execution in Google's Napa

被引:0
|
作者
Tatemura, Junichi [1 ]
Zou, Tao [1 ]
Sankaranarayanan, Jagan [1 ]
Huang, Yanlai [1 ]
Chen, Jim [1 ]
Zhang, Yupu [1 ]
Lai, Kevin [1 ]
Zhang, Hao [1 ]
Manoharan, Gokul Nath Babu [1 ]
Graefe, Goetz [1 ]
Agrawal, Divyakant [1 ]
Adelberg, Brad [1 ]
Kolhar, Shilpa [1 ]
Roy, Indrajit [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 12期
关键词
DATABASE; CRACKING;
D O I
10.14778/3611540.3611541
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Napa holds Google's critical data warehouses in log-structured merge trees for real-time data ingestion and sub-second response for billions of queries per day. These queries are often multi-key look-ups in highly skewed tables and indexes. In our production experience, only progressive query-specific partitioning can achieve Napa's strict query latency SLOs. Here we advocate good-enough partitioning that keeps the per-query partitioning time low without risking uneven work distribution. Our design combines pragmatic system choices and algorithmic innovations. For instance, B-trees are augmented with statistics of key distributions, thus serving the dual purpose of aiding lookups and partitioning. Furthermore, progressive partitioning is designed to be "good enough" thereby balancing partitioning time with performance. The resulting system is robust and successfully serves day-in-day-out billions of queries with very high quality of service forming a core infrastructure at Google.
引用
收藏
页码:3475 / 3487
页数:13
相关论文
共 9 条
  • [1] Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google
    Agiwal, Ankur
    Lai, Kevin
    Manoharan, Gokul Nath Babu
    Roy, Indrajit
    Sankaranarayanan, Jagan
    Zhang, Hao
    Zou, Tao
    Chen, Min
    Chen, Zongchang
    Dai, Ming
    Do, Thanh
    Gao, Haoyu
    Geng, Haoyan
    Grover, Raman
    Huang, Bo
    Huang, Yanlai
    Li, Zhi
    Liang, Jianyi
    Lin, Tao
    Liu, Li
    Liu, Yao
    Mao, Xi
    Meng, Yalan
    Mishra, Prashant
    Patel, Jay
    Rajesh, S. R.
    Raman, Vijayshankar
    Roy, Sourashis
    Shishodia, Mayank Singh
    Sun, Tianhang
    Tang, Ye
    Tatemura, Junichi
    Trehan, Sagar
    Vadali, Ramkumar
    Venkatasubramanian, Prasanna
    Zhang, Gensheng
    Zhang, Kefei
    Zhang, Yupu
    Zhuang, Zeleng
    Graefe, Goetz
    Agrawal, Divyakant
    Naughton, Jeff
    Kosalge, Sujata
    Hacigumus, Hakan
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2986 - 2997
  • [2] Query Execution for RDF Data using Structure Indexed Vertical Partitioning
    Shah, Bhavik
    Padiya, Trupti
    Bhise, Minal
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 575 - 584
  • [3] Children's query types and reformulations in Google search
    Bilal, Dania
    Gwizdka, Jacek
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1022 - 1041
  • [4] The query execution engine in Tandem's new ServerWare SQL product
    Cells, P
    Zeller, H
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, 1996, : 289 - 291
  • [5] GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution
    Velentzas, Polychronis
    Vassilakopoulos, Michael
    Corral, Antonio
    Antonopoulos, Christos
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2023, 51 (06) : 275 - 308
  • [6] GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution
    Polychronis Velentzas
    Michael Vassilakopoulos
    Antonio Corral
    Christos Antonopoulos
    [J]. International Journal of Parallel Programming, 2023, 51 : 275 - 308
  • [7] The Efficacy of Google's Suggested Keywords and Phrases in Query Expansion Based on "The Least effort Principle" and "Cognitive Load Theory"
    Fattahi, Rahmatollah
    Parirokh, Mehri
    Dayyani, Mohammd Hosien
    Khosravi, Abdolrasoul
    [J]. CATEGORIES, CONTEXTS AND RELATIONS IN KNOWLEDGE ORGANIZATION, 2012, 13 : 378 - 378
  • [8] B4 and After: Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN
    Hong, Chi-Yao
    Mandal, Subhasree
    Al-Fares, Mohammad
    Zhu, Min
    Alimi, Richard
    Naidu, Kondapa B.
    Bhagat, Chandan
    Jain, Sourabh
    Kaimal, Jay
    Liang, Shiyu
    Mendelev, Kirill
    Padgett, Steve
    Rabe, Faro
    Ray, Saikat
    Tewari, Malveeka
    Tierney, Matt
    Zahn, Monika
    Zolla, Jonathan
    Ong, Joon
    Vahdat, Amin
    [J]. PROCEEDINGS OF THE 2018 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION (SIGCOMM '18), 2018, : 74 - 87
  • [9] Children's help-seeking behaviors and effects of domain knowledge in using Google and Kids.gov: Query formulation and results evaluation stages
    Han, Hyejung
    [J]. LIBRARY & INFORMATION SCIENCE RESEARCH, 2018, 40 (3-4) : 208 - 218