Track Join: Distributed Joins with Minimal Network Traffic

被引:47
|
作者
Polychroniou, Orestis [1 ,2 ]
Sen, Rajkumar [2 ]
Ross, Kenneth A. [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Oracle Labs, Redwood Shores, CA USA
基金
美国国家科学基金会;
关键词
MULTI-CORE; HASH; SEMIJOINS;
D O I
10.1145/2588555.2610521
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network communication is the slowest component of many operators in distributed parallel databases deployed for large-scale analytics. Whereas considerable work has focused on speeding up databases on modern hardware, communication reduction has received less attention. Existing parallel DBMSs rely on algorithms designed for disks with minor modifications for networks. A more complicated algorithm may burden the CPUs, but could avoid redundant transfers of tuples across the network. We introduce track join, a novel distributed join algorithm that minimizes network traffic by generating an optimal transfer schedule for each distinct join key. Track join extends the trade-off options between CPU and network. Our evaluation based on real and synthetic data shows that track join adapts to diverse cases and degrees of locality. Considering both network traffic and execution time, even with no locality, track join outperforms hash join on the most expensive queries of real workloads.
引用
收藏
页码:1483 / 1494
页数:12
相关论文
共 50 条
  • [1] Distributed Joins and Data Placement for Minimal Network Traffic
    Polychroniou, Orestis
    Zhang, Wangda
    Ross, Kenneth A.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2018, 43 (03):
  • [2] Random access with a distributed Bitmap Join Index for Star Joins
    Brito, Jaqueline J.
    Mosqueiro, Thiago
    Ciferri, Ricardo R.
    Ciferri, Cristina D. A.
    HELIYON, 2020, 6 (02)
  • [3] Minimizing Network Traffic for Distributed Joins Using Lightweight Locality-Aware Scheduling
    Cheng, Long
    Murphy, John
    Liu, Qingzhi
    Hao, Chunliang
    Theodoropoulos, Georgios
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 293 - 305
  • [4] AdaptMX: Flexible Join-Matrix Streaming System for Distributed Theta-Joins
    Wang, Xiaotong
    Jiang, Cheng
    Fang, Junhua
    Wang, Xiangfeng
    Zhang, Rong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2018), PT II, 2018, 10828 : 802 - 806
  • [5] JOIN OPTIMIZATION IN DISTRIBUTED DATABASES ON BROADCAST NETWORK
    AHN, JK
    MOON, SC
    MICROPROCESSING AND MICROPROGRAMMING, 1990, 30 (1-5): : 637 - 644
  • [6] Fast joins using join indices
    Zhe Li
    Kenneth A. Ross
    The VLDB Journal, 1999, 8 : 1 - 24
  • [7] Wander Join: Online Aggregation for Joins
    Li, Feifei
    Wu, Bin
    Yi, Ke
    Zhao, Zhuoyue
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2121 - 2124
  • [8] Fast joins using join indices
    Li, Z
    Ross, KA
    VLDB JOURNAL, 1999, 8 (01): : 1 - 24
  • [9] Faster joins, self-joins and multi-way joins using join indices
    Lei, H
    Ross, KA
    DATA & KNOWLEDGE ENGINEERING, 1999, 29 (02) : 179 - 200
  • [10] Faster joins, self-joins and multi-way joins using join indices
    Lei, H
    Ross, KA
    DATA & KNOWLEDGE ENGINEERING, 1998, 28 (03) : 277 - 298