To Partition, or Not to Partition, That is the Join Question in a Real System

被引:17
|
作者
Bandle, Maximilian [1 ]
Giceva, Jana [1 ]
Neumann, Thomas [1 ]
机构
[1] Tech Univ Munich, Munich, Germany
基金
欧洲研究理事会;
关键词
Performance Evaluation; Partitioning; Join Processing; Modern Hardware; In-Memory Databases; MULTI-CORE; MATERIALIZATION STRATEGIES; PERFORMANCE;
D O I
10.1145/3448016.3452831
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An efficient implementation of a hash join has been a highly researched problem for decades. Recently, the radix join has been shown to have superior performance over the alternatives (e.g., the non-partitioned hash join), albeit on synthetic microbenchmarks. Therefore, it is unclear whether one can simply replace the hash join in an RDBMS or use the radix join as a performance booster for selected queries. If the latter, it is still unknown when one should rely on the radix join to improve performance. In this paper, we address these questions, show how to integrate the radix join in Umbra, a code-generating DBMS, and make it competitive for selective queries by introducing a Bloom-filter based semi-join reducer. We have evaluated how well it runs when used in queries from more representative workloads like TPC-H. Surprisingly, the radix join brings a noticeable improvement in only one out of all 59 joins in TPC-H. Thus, with an extensive range of microbenchmarks, we have isolated the effects of the most important workload factors and synthesized the range of values where partitioning the data for the radix join pays off. Our analysis shows that the benefit of data partitioning quickly diminishes as soon as we deviate from the optimal parameters, and even late materialization rarely helps in real workloads. We thus, conclude that integrating the radix join within a code-generating database rarely justifies the increase in code and optimizer complexity and advise against it for processing real-world workloads.
引用
收藏
页码:168 / 180
页数:13
相关论文
共 50 条
  • [1] Overlap Interval Partition Join
    Dignos, Anton
    Boehlen, Michael H.
    Gamper, Johann
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1459 - 1470
  • [2] The Partition Transform Algorithm of Join Query
    Zou, Xianxia
    Jia, Weijia
    Pan, Jiuhui
    Du, Wei
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 31 - +
  • [3] Community partition based chat system easy to join for new members
    Gyohten, K
    Hirayama, Y
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 525 - 529
  • [4] Transformation-Based Spatial Partition Join
    Lee, MJ
    Han, WS
    Whang, KY
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2004, 19 (06): : 355 - 362
  • [5] Kosovo: The partition question - Divide and quit
    Singh, AI
    [J]. WORLD TODAY, 2004, 60 (05): : 22 - 23
  • [6] Two-level partition scheduling in hard real time system under strong partition constraints
    Li, Xin-Ying
    Gu, Jian
    He, Feng
    Xiong, Hua-Gang
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (06): : 1032 - 1039
  • [7] On label stream partition for efficient holistic twig join
    Chen, Bo
    Ling, Tok Wang
    Ozsu, M. Tamer
    Zhu, Zhenzhou
    [J]. ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 807 - +
  • [8] Parallel algorithms for spatial data partition and join processing
    Zhang, YC
    Xiao, JT
    Roberts, AJ
    [J]. ICA(3)PP 97 - 1997 3RD INTERNATIONAL CONFERENCE ON ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, 1997, : 703 - 716
  • [9] Partition based path join algorithms for XML data
    Li, QZ
    Moon, B
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, 2736 : 160 - 170
  • [10] SELECT-PARTITIONED JOIN - AN IMPROVED PARTITION-BASED JOIN ALGORITHM
    HO, C
    JONG, SP
    MYUNGHWAN, K
    [J]. INFORMATION SYSTEMS, 1991, 16 (02) : 199 - 209