Forecasting the Cost of Processing Multi-join Queries via Hashing for Main-memory Databases

被引:14
|
作者
Liu, Feilong [1 ]
Blanas, Spyros [1 ]
机构
[1] Ohio State Univ, Comp Sci & Engn, Columbus, OH 43210 USA
关键词
OPTIMIZATION;
D O I
10.1145/2806777.2806944
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2x slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10x as the number of joins in the query increases.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 8 条
  • [1] Distributed multi-level recovery in main-memory databases
    Bohannon, P
    Parker, J
    Rastogi, R
    Seshadri, S
    Silberschatz, A
    Sudarshan, S
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, 1996, : 44 - 55
  • [2] Distributed multi-level recovery in main-memory databases
    Rastogi, R
    Bohannon, P
    Parker, J
    Silberschatz, A
    Seshadri, S
    Sudarshan, S
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 1998, 6 (01) : 41 - 71
  • [3] Distributed Multi-Level Recovery in Main-Memory Databases
    Rajeev Rastogi
    Philip Bohannon
    James Parker
    Avi Silberschatz
    S. Seshadri
    S. Sudarshan
    [J]. Distributed and Parallel Databases, 1998, 6 : 41 - 71
  • [4] SKEW HANDLING STRATEGIES FOR PIPELINED PROCESSING OF MULTI-JOIN QUERIES IN SHARED-NOTHING SYSTEMS
    TAN, KL
    LU, HJ
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 1995, 10 (01): : 3 - 18
  • [5] Cost-based solution for optimizing multi-join queries over distributed streaming sensor data
    Gomes, Joseph
    Choi, Hyeong-Ah
    [J]. 2006 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, 2006, : 282 - +
  • [6] PolyHJ: A Polymorphic Main-Memory Hash Join Paradigm for Multi-Core Machines
    Khattab, Omar
    Hammoud, Mohammad
    Shekfeh, Omar
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1323 - 1332
  • [7] DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases
    Ahmad, Yanif
    Koch, Christoph
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1566 - 1569
  • [8] Towards multi-purpose main-memory storage structures: Exploiting sub-space distance equalities in totally ordered data sets for exact knn queries
    Schaeler, Martin
    Tex, Christine
    Koeppen, Veit
    Broneske, David
    Saake, Gunter
    [J]. INFORMATION SYSTEMS, 2021, 101