Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs

被引：158

作者：

Kim, Changkyu ^{[1
]}

Sedlar, Eric ^{[2
]}

Chhugani, Jatin ^{[1
]}

Kaldewey, Tim ^{[2
]}

Nguyen, Anthony D. ^{[1
]}

Di Bias, Andrea ^{[2
]}

Lee, Victor W. ^{[1
]}

Satish, Nadathur ^{[1
]}

Dubey, Pradeep ^{[1
]}

机构：

[1] Intel Corp, Throughput Comp Lab, Santa Clara, CA 95054 USA

[2] Oracle Corp, Special Projects Grp, Redwood Shores, CA 94065 USA

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2009年 / 2卷 / 02期

关键词：

D O I：

10.14778/1687553.1687564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Join is an important database operation. As computer architectures evolve, the best join algorithm may change hand. This paper reexamines two popular join algorithms - hash join and sort-merge join - to determine if the latest computer architecture trends shift the tide that has favored hash join for many years. For a fair comparison, we implemented the most optimized parallel version of both algorithms on the latest Intel Core i7 platform. Both implementations scale well with the number of cores in the system and take advantages of latest processor features for performance. Our hash-based implementation achieves more than 100M tuples per second which is 17X faster than the best reported performance on CPUs and 8X faster than that reported for GPUs. Moreover, the performance of our hash join implementation is consistent over a wide range of input data sizes from 64K to 128M tuples and is not affected by data skew. We compare this implementation to our highly optimized sort-based implementation that achieves 47M to 80M tuples per second. We developed analytical models to study how both algorithms would scale with upcoming processor architecture trends. Our analysis projects that current architectural trends of wider SIMD, more cores, and smaller memory bandwidth per core imply better scalability potential for sort-merge join. Consequently, sort- merge join is likely to outperform hash join on upcoming chip multiprocessors. In summary, we offer multicoreimplementations of hash join and sort-merge join which consistently outperform all previously reported results. We further conclude that the tide that favors the hash join algorithm has not changed yet, but the change is just around the comer.

引用

页码：1378 / 1389

页数：12

共 32 条

[21] Multi-core vs. I/O Wall: The Approaches to Conquer and Cooperate
Zhang, Yansong
Jiao, Min
Wang, Zhanwei
Wang, Shan
Zhou, Xuan
WEB-AGE INFORMATION MANAGEMENT, 2011, 6897 : 467 - 479
[22] Fast and Adaptive BP-based Multi-core Implementation for Stereo Matching
Ahmadzadeh, Armin
Madani, Hatef
Jafari, Kianoush
Jazi, Farzad Salimi
Daneshpajouh, Shervin
Gorgin, Saeid
2013 ELEVENTH ACM/IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR CODESIGN (MEMOCODE 2013), 2013, : 135 - 138
[23] A Performance evaluation of a Probabilistic Parallel Genetic Algorithm: FPGA vs. Multi-core Processor
Jewajinda, Yutana
2013 INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2013, : 298 - 301
[24] Evolutionary vs. Revolutionary Interconnect Technologies for Future Low-Power Multi-Core Systems
Miorandi, Gabriele
Tala, Mandi
Balboni, Marco
Ramini, Luca
Bertozzi, Davide
PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON ADVANCED INTERCONNECT SOLUTIONS AND TECHNOLOGIES FOR EMERGING COMPUTING SYSTEMS, AISTECS 2016, 2016,
[25] Central vs. Distributed Dynamic Thermal Management for Multi-Core Processors: Which one is better?
Kadin, Michael
Reda, Sherief
Uht, Augustus
GLSVLSI 2009: PROCEEDINGS OF THE 2009 GREAT LAKES SYMPOSIUM ON VLSI, 2009, : 137 - 140
[26] Fast computation of 2D and 3D Legendre moments using multi-core CPUs and GPU parallel architectures
Khalid M. Hosny
Ahmad Salah
Hassan I. Saleh
Mahmoud Sayed
Journal of Real-Time Image Processing, 2019, 16 : 2027 - 2041
[27] Fast computation of 2D and 3D Legendre moments using multi-core CPUs and GPU parallel architectures
Hosny, Khalid M.
Salah, Ahmad
Saleh, Hassan, I
Sayed, Mahmoud
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2019, 16 (06) : 2027 - 2041
[28] Parallel implementation of randomized singular value decomposition and randomized spatial downsampling for real time ultrafast microvessel imaging on a multi-core CPUs architecture
Loki, U-Wai
Song, Pengfei
Trzasko, Joshua D.
Borisch, Eric A.
Daigle, Ron
Chen, Shigao
2018 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2018,
[29] Mono- vs. multi-core magnetic iron oxide nanoparticles as dual agents for imaging and treatment of glioblastoma
Hemery, Gauvin
Genevois, Coralie
Couillaud, Franck
Lacomme, Sabrina
Gontier, Etienne
Lecommandoux, Sebastien
Garanger, Elisabeth
Sandre, Olivier
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
[30] An empirical validation of power-performance scaling: DVFS vs. multi-core scaling in big. LITTLE processor
Yoo, Seehwan
IEICE ELECTRONICS EXPRESS, 2015, 12 (08): : 1 - 9

← 1 2 3 4 →