Cost-Based Join Algorithm Selection in Hadoop

被引:0
|
作者
Gu, Jun [1 ]
Peng, Shu [1 ]
Wang, X. Sean [1 ]
Rao, Weixiong [2 ]
Yang, Min [1 ]
Cao, Yu [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Tongji Univ, Sch Software Engn, Shanghai, Peoples R China
[3] EMC Labs, Beijing, Peoples R China
关键词
Join algorithm; Cost model; Hadoop; Hive;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, MapReduce has become a popular computing framework for big data analysis. Join is a major query type for data analysis and various algorithms have been designed to process join queries on top of Hadoop. Since the efficiency of different algorithms differs on the join tasks on hand, to achieve a good performance, users need to select an appropriate algorithm and use the algorithm with a proper configuration, which is rather difficult for many end users. This paper proposes a cost model to estimate the cost of four popular join algorithms. Based on the cost model, the system may automatically choose the join algorithm with the least cost, and then give the reasonable configuration values for the chosen algorithm. Experimental results with the TPC-H benchmark verify that the proposed method can correctly choose the best join algorithm, and the chosen algorithm can achieve a speedup of around 1.25 times over the default join algorithm.
引用
收藏
页码:246 / 261
页数:16
相关论文
共 50 条
  • [1] Cost-Based Predictive Spatiotemporal Join
    Han, Wook-Shin
    Kim, Jaehwa
    Lee, Byung Suk
    Tao, Yufei
    Rantzau, Ralf
    Markl, Volker
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (02) : 220 - 233
  • [2] A framework for cost-based feature selection
    Bolon-Canedo, V.
    Porto-Diaz, I.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    [J]. PATTERN RECOGNITION, 2014, 47 (07) : 2481 - 2489
  • [3] COST-BASED AVAILABILITY ALLOCATION ALGORITHM
    MCNICHOLS, RJ
    MESSER, GH
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 1971, R 20 (03) : 178 - +
  • [4] Selection of materialized views:: A cost-based approach
    Baril, X
    Bellahsène, Z
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2003, 2681 : 665 - 680
  • [5] Cost-based Join Processing Scheme in a Hybrid RDBMS and Hive System
    Kim, Taewon
    Chung, Haejin
    Choi, Wonsuk
    Choi, Jongmoo
    Kim, Joonmo
    [J]. 2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 160 - 164
  • [6] A cost-based replacement algorithm for object buffers
    Park, CM
    Whang, KY
    Lee, JJ
    Song, IY
    [J]. 24TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COSPSAC 2000), 2000, 24 : 589 - 597
  • [7] Cost-Based Optimization of Logical Partitions for a Query Workload in a Hadoop Data Warehouse
    Peng, Shu
    Gu, Jun
    Wang, X. Sean
    Rao, Weixiong
    Yang, Min
    Cao, Yu
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, APWEB 2014, 2014, 8709 : 559 - 567
  • [8] Cost-based Feature Selection for Network Model Choice
    Raynal, Louis
    Hoffmann, Till
    Onnela, Jukka-Pekka
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (03) : 1109 - 1118
  • [9] A cost-based algorithm for design of Cellular Manufacturing Systems
    Kyung-Hyun Choi
    Hae-Yong Cho
    [J]. KSME International Journal, 1998, 12 : 181 - 190
  • [10] A cost-based algorithm for design of cellular manufacturing systems
    Choi, KH
    Cho, HY
    [J]. KSME INTERNATIONAL JOURNAL, 1998, 12 (02): : 181 - 190