CAT: A Cost-Aware Translator for SQL-query workflow to MapReduce jobflow

被引:1
|
作者
Song, Aibo [1 ,2 ]
Wu, Zhiang [3 ]
Ma, Xu [1 ,2 ]
Luo, Junzhou [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing, Jiangsu, Peoples R China
[3] Nanjing Univ Finance & Econ, Jiangsu Prov Key Lab E Business, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
MapReduce; SQL-to-MapReduce; Intra-SQL correlations; Cost estimation model; Hadoop; Query;
D O I
10.1016/j.datak.2015.12.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
MapReduce is undoubtedly the most popular framework for large-scale processing and analysis of vast data sets in clusters of machines. To facilitate the easier use of MapReduce, SQL-like declarative languages and SQL-to-MapReduce translators have attracted increasing attentions recently. The SQL-to-MapReduce translator can automatically generate the MapReduce jobflow for each SQL query submitted by users, which significantly simplifies the interfacing between users and systems. Although a plethora of translators have been developed, the auto-generated MapReduce programs still suffered from extremely inefficiency. In this paper, we attempt to address this challenge by developing a novel Cost-Aware Translator (CAT). CAT has two notable features. First, it defines two intra-SQL correlations: Generalized Job Flow Correlation (GJFC) and Input Correlation (IC), based on which a set of looser merging rules are introduced. Thus, both Top-Down (TD) and Bottom-Up (BU) merging strategies are proposed and integrated into CAT simultaneously. Second, it adopts a cost estimation model for MapReduce jobflows to guide the selection of a more efficient MapReduce jobflows auto generated by TD and BU merging strategies. Finally, comparative experiments on TPC-H benchmark demonstrate the effectiveness and scalability of CAT. (C) 2016 Elsevier B.V. All right reserved.
引用
收藏
页码:42 / 56
页数:15
相关论文
共 32 条
  • [1] Efficiently Translating Complex SQL Query to MapReduce Jobflow on Cloud
    Wu, Zhiang
    Song, Aibo
    Cao, Jie
    Luo, Junzhou
    Zhang, Lu
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (02) : 508 - 517
  • [2] CAT: A Cost-Aware BitTorrent
    Yamazaki, Shusuke
    Tode, Hideki
    Murakami, Koso
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2008, E91B (12) : 3831 - 3841
  • [3] Cost-aware query planning for similarity search
    Lange, Dustin
    Naumann, Felix
    [J]. INFORMATION SYSTEMS, 2013, 38 (04) : 455 - 469
  • [4] Privacy-aware and cost-aware workflow scheduling in clouds
    Wen Y.
    Liu J.
    Chen C.
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2016, 22 (02): : 294 - 301
  • [5] A Cost-Aware Scheduling Algorithm for Reliable Workflow in IaaS Clouds
    Ye, Lingjuan
    Xia, Yuanqing
    Yang, Liwen
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 275 - 280
  • [6] Cost-aware and privacy-aware workflow scheduling strategy in hybrid clouds
    Wen Y.
    Wang Z.
    Liu J.
    Xu X.
    Chen A.
    Cao B.
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2020, 26 (06): : 1582 - 1588
  • [7] Cost-aware load balancing for multilingual record linkage using MapReduce
    Medhat, Doaa
    Yousef, Ahmed H.
    Salama, Cherif
    [J]. AIN SHAMS ENGINEERING JOURNAL, 2020, 11 (02) : 419 - 433
  • [8] Deadline-constrained cost-aware workflow scheduling in hybrid cloud
    Hussain, Mehboob
    Luo, Ming-Xing
    Hussain, Abid
    Javed, Muhammad Hafeez
    Abbas, Zeeshan
    Wei, Lian-Fu
    [J]. SIMULATION MODELLING PRACTICE AND THEORY, 2023, 129
  • [9] Cost-aware cloud workflow scheduling using DRL and simulated annealing
    Gu, Yan
    Cheng, Feng
    Yang, Lijie
    Xu, Junhui
    Chen, Xiaomin
    Cheng, Long
    [J]. Digital Communications and Networks, 2024, 10 (06) : 1590 - 1599
  • [10] Cost-aware cloud workflow scheduling using DRL and simulated annealing
    Yan Gu
    Feng Cheng
    Lijie Yang
    Junhui Xu
    Xiaomin Chen
    Long Cheng
    [J]. Digital Communications and Networks, 2024, 10 (06) : 1590 - 1599