Track Join: Distributed Joins with Minimal Network Traffic

被引:47
|
作者
Polychroniou, Orestis [1 ,2 ]
Sen, Rajkumar [2 ]
Ross, Kenneth A. [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Oracle Labs, Redwood Shores, CA USA
基金
美国国家科学基金会;
关键词
MULTI-CORE; HASH; SEMIJOINS;
D O I
10.1145/2588555.2610521
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network communication is the slowest component of many operators in distributed parallel databases deployed for large-scale analytics. Whereas considerable work has focused on speeding up databases on modern hardware, communication reduction has received less attention. Existing parallel DBMSs rely on algorithms designed for disks with minor modifications for networks. A more complicated algorithm may burden the CPUs, but could avoid redundant transfers of tuples across the network. We introduce track join, a novel distributed join algorithm that minimizes network traffic by generating an optimal transfer schedule for each distinct join key. Track join extends the trade-off options between CPU and network. Our evaluation based on real and synthetic data shows that track join adapts to diverse cases and degrees of locality. Considering both network traffic and execution time, even with no locality, track join outperforms hash join on the most expensive queries of real workloads.
引用
收藏
页码:1483 / 1494
页数:12
相关论文
共 50 条
  • [21] JOINS OF PAIRS OF MINIMAL DEGREES
    COOPER, SB
    JOURNAL OF SYMBOLIC LOGIC, 1970, 35 (04) : 601 - +
  • [22] On track: Calatrava joins WTC team
    Lubell, S
    ARCHITECTURAL RECORD, 2003, 191 (09) : 35 - 35
  • [23] Cooperative track initiation for distributed radar network based on target track information
    Liu, Hongwei
    Liu, Hongliang
    Dan, Xiaodong
    Zhou, Shenghua
    Liu, Jun
    IET RADAR SONAR AND NAVIGATION, 2016, 10 (04): : 735 - 741
  • [24] Generalized distributed track-to-track association algorithm for collaborative network tracking
    Yang, H.-Y. (yanghy07@mails.tsinghua.edu.cn), 1600, Chinese Institute of Electronics (34):
  • [25] Distributed Flow Network Balancing With Minimal Effort
    Oliva, Gabriele
    Rikos, Apostolos I.
    Hadjicostis, Christoforos N.
    Gasparri, Andrea
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (09) : 3529 - 3543
  • [26] A Distributed Semiasynchronous Algorithm for Network Traffic Engineering
    Liao, Wei-Cheng
    Hong, Mingyi
    Farmanbar, Hamid
    Luo, Zhi-Quan
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2018, 4 (03): : 436 - 450
  • [27] Autonomous distributed control of traffic signal network
    Sugi, M
    Yuasa, H
    Arai, T
    INTELLIGENT AUTONOMOUS SYSTEMS 7, 2002, : 317 - 324
  • [28] Network traffic characterization of distributed database applications
    Khunboa, C
    Banerjee, S
    31ST ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 1998, : 98 - 105
  • [29] A Distributed Traffic Replay Framework for Network Emulation
    Huang, Xiao
    Wang, Xiaofeng
    Liu, Yuan
    Xue, Qingsong
    INFORMATION, 2023, 14 (02)
  • [30] Distributed Traffic Replay System for Network Emulation
    Ye, Haibo
    Li, Zhigang
    Huang, Xiao
    Wang, Xiaofeng
    Liu, Yuan
    Computer Engineering and Applications, 2024, 60 (12) : 261 - 269