Track Join: Distributed Joins with Minimal Network Traffic

被引:47
|
作者
Polychroniou, Orestis [1 ,2 ]
Sen, Rajkumar [2 ]
Ross, Kenneth A. [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Oracle Labs, Redwood Shores, CA USA
基金
美国国家科学基金会;
关键词
MULTI-CORE; HASH; SEMIJOINS;
D O I
10.1145/2588555.2610521
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network communication is the slowest component of many operators in distributed parallel databases deployed for large-scale analytics. Whereas considerable work has focused on speeding up databases on modern hardware, communication reduction has received less attention. Existing parallel DBMSs rely on algorithms designed for disks with minor modifications for networks. A more complicated algorithm may burden the CPUs, but could avoid redundant transfers of tuples across the network. We introduce track join, a novel distributed join algorithm that minimizes network traffic by generating an optimal transfer schedule for each distinct join key. Track join extends the trade-off options between CPU and network. Our evaluation based on real and synthetic data shows that track join adapts to diverse cases and degrees of locality. Considering both network traffic and execution time, even with no locality, track join outperforms hash join on the most expensive queries of real workloads.
引用
收藏
页码:1483 / 1494
页数:12
相关论文
共 50 条
  • [41] Improvement of Join Algorithms for Low-Selectivity Joins on MapReduce
    Matono, Akiyoshi
    Ogawa, Hirotaka
    Kojima, Isao
    DATABASES THEORY AND APPLICATIONS, 2015, 9093 : 117 - 128
  • [42] The SkyplexNet, a new satellite network with traffic control and distributed network management
    Tomasicchio, G
    SPACE COMMUNICATIONS, 2000, 16 (04) : 227 - 241
  • [43] Coater joins network
    不详
    BRITISH CORROSION JOURNAL, 1996, 31 (01): : 5 - 5
  • [44] Virtual Visualization of Distributed Network Traffic using Fractals
    Sharmi, S.
    Chauhan, Munesh Singh
    2012 INTERNATIONAL CONFERENCE ON RADAR, COMMUNICATION AND COMPUTING (ICRCC), 2012, : 98 - 100
  • [45] Application of distributed and parallel computing in traffic network simulation
    Juan, Zhicai
    Gao, Linjie
    Jia, Hongfei
    DCABES 2006 Proceedings, Vols 1 and 2, 2006, : 108 - 112
  • [46] Cooperative Learning for Distributed In-Network Traffic Classification
    Joseph, S. B.
    Loo, H. R.
    Ismail, I.
    Andromeda, T.
    Marsono, M. N.
    IAES INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTER SCIENCE AND INFORMATICS, 2017, 190
  • [47] Distributed Traffic Signal Control for Maximum Network Throughput
    Wongpiromsarn, Tichakorn
    Uthaicharoenpong, Tawit
    Wang, Yu
    Frazzoli, Emilio
    Wang, Danwei
    2012 15TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2012, : 588 - 595
  • [48] Pricing and distributed QoS control for elastic network traffic
    van den Berg, Hans
    Mandjes, Michel
    Nunez-Queija, Rudesindo
    OPERATIONS RESEARCH LETTERS, 2007, 35 (03) : 297 - 307
  • [49] The effect of network topology on the control traffic in distributed SDN
    Naseer, Muhammad Zeshan
    Fodor, Viktoria
    2018 IFIP NETWORKING CONFERENCE (IFIP NETWORKING) AND WORKSHOPS, 2018, : 199 - 207
  • [50] A network partitioning methodology for distributed traffic management applications
    Etemadnia, Hamideh
    Abdelghany, Khaled
    Hassan, Ahmed
    TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2014, 10 (06) : 518 - 532