Multivariate modeling and two-level scheduling of analytic queries

被引:2
|
作者
Liu, Zhuo [1 ]
Nath, Amit Kumar [2 ]
Ding, Xiaoning [3 ]
Fu, Huansong [2 ]
Khan, Md Muhib [2 ]
Yu, Weikuan [2 ]
机构
[1] Auburn Univ, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
[2] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
[3] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
基金
美国国家科学基金会;
关键词
MapReduce; Multivariate modeling; Query scheduling; MANAGEMENT;
D O I
10.1016/j.parco.2019.01.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Analytic queries are typically compiled into execution plans in the form of directed acyclic graphs (DAGs) of MapReduce jobs. Jobs in the DAGs are dispatched to the MapReduce processing engine as soon as their dependencies are satisfied. MapReduce adopts a job-level scheduling policy to strive for a balanced distribution of tasks and effective utilization of resources. However, such simplistic policy is unable to reconcile the dynamics of different jobs in complex analytic queries, resulting in unfair treatment of different queries, low utilization of system resources, prolonged execution time, and low query throughput. Therefore, we introduce a scheduling framework to address these problems systematically. Our framework includes two techniques: multivariate DAG modeling and two-level query scheduling. Cross-layer semantics percolation allows the flow of query semantics and job dependencies in the DAG to the MapReduce scheduler. With richer semantics information, we build a multivariate model that can accurately predict the execution time of individual MapReduce jobs and gauge the changing size of analytics datasets through selectivity approximation. Furthermore, we introduce two-level query scheduling that can maximize the intra-query job-level concurrency, and at the same time speed up the query-level completion time based on the accurate prediction and queuing of queries. At the job level, we focus on detecting query semantics, predicting the query completion time through an online multivariate linear regression model, thereby increasing job-level parallelism and maximizing data sharing across jobs. At the task level, we focus on balanced data distribution, maximal slot utilization, and optimal data locality of task scheduling. Our experimental results on a set of complex query benchmarks demonstrate that our scheduling framework can significantly improve both fairness and throughput of Hive queries. It can improve query response time by up to 43.9% and 72.8% on average, compared to the Hadoop Fair Scheduling and the Hadoop Capacity Scheduling, respectively. In addition, our two-level scheduler can achieve a query fairness that is, on average, 59.8% better than that of the Hadoop Fair Scheduler. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:66 / 78
页数:13
相关论文
共 50 条
  • [31] One-level and Two-level Scheduling for Real-time Systems
    Poles, Damir
    2013 IEEE EUROCON, 2013, : 569 - 576
  • [32] A New Two-Level Scheduling Algorithm for the Downlink of LTE Networks
    Avocanh, Jean Thierry Stephen
    Abdennebi, Marwen
    Ben-Othman, Jalel
    2013 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2013, : 4519 - 4523
  • [33] A two-level hierarchical approach for raw material scheduling in steelworks
    Suh, MS
    Lee, YJ
    Ko, YK
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 1997, 10 (05) : 503 - 515
  • [34] Batch scheduling in a two-level supply chain - a focus on the supplier
    Selvarajah, Esaignani
    Steiner, George
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 173 (01) : 226 - 240
  • [35] Using an Evolutionary Algorithm for Scheduling of Two-Level Nested Loops
    Hajieskandar, AliReza
    Lotfi, Shahriar
    INFORMATION AND ELECTRONICS ENGINEERING, 2011, 6 : 100 - 104
  • [36] Analytic solutions of the susceptibility for Doppler-broadened two-level atoms
    Noh, Heung-Ryoul
    Jhe, Wonho
    OPTICS COMMUNICATIONS, 2010, 283 (14) : 2845 - 2848
  • [37] Scheduling printed circuit board production systems using the two-level scheduling approach
    Lin, FR
    Shaw, MJ
    Locascio, A
    JOURNAL OF MANUFACTURING SYSTEMS, 1997, 16 (02) : 129 - 149
  • [38] Scheduling Printed Circuit Board Production Systems Using the Two-Level Scheduling Approach
    Univ. Illinois at Urbana-Champaign, Champaign, IL, United States
    不详
    不详
    不详
    不详
    J Manuf Syst, 2 (129-149):
  • [39] Modeling harmonic generation by a degenerate two-level atom
    Burlon, R
    Ferrante, G
    Leone, C
    Oleinikov, PA
    Platonenko, VT
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA B-OPTICAL PHYSICS, 1996, 13 (01) : 162 - 169
  • [40] A Two-Level Distributed Approach to Power Network Modeling
    Sun, Hongbin
    Chen, Runze
    Guo, Qinglai
    Wang, Jing
    Zhang, Yang
    Wu, Wenchuan
    Zhang, Boming
    IEEE TRANSACTIONS ON POWER DELIVERY, 2015, 30 (03) : 1496 - 1504