NEW ALGORITHMS FOR PARALLELIZING RELATIONAL DATABASE JOINS IN THE PRESENCE OF DATA SKEW

被引:10
|
作者
WOLF, JL
DIAS, DM
YU, PS
TUREK, J
机构
[1] IBM Research Division, T. J. Watson Research Center, Yorktown Heights, NY, 10598
关键词
ALGORITHMS; DATABASES; DATA SKEW; JOINS; OPTIMIZATION; PARALLEL PROCESSING;
D O I
10.1109/69.334888
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Parallel processing is an attractive option for relational database systems. As in any parallel environment, however, load balancing is a critical issue which affects overall performance. Load balancing for one common database operation in particular, the join of two relations, ran be severely hampered for conventional parallel algorithms, due to a natural phenomenon known as data skew. In a pair of recent papers me described two new join algorithms designed to address the data skew problem. In this paper we propose significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. The paper then focuses on the comparative performance of the improved algorithms and their more conventional counterparts. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew.
引用
收藏
页码:990 / 997
页数:8
相关论文
共 50 条