Multivariate modeling and two-level scheduling of analytic queries

被引:2
|
作者
Liu, Zhuo [1 ]
Nath, Amit Kumar [2 ]
Ding, Xiaoning [3 ]
Fu, Huansong [2 ]
Khan, Md Muhib [2 ]
Yu, Weikuan [2 ]
机构
[1] Auburn Univ, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
[2] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
[3] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
基金
美国国家科学基金会;
关键词
MapReduce; Multivariate modeling; Query scheduling; MANAGEMENT;
D O I
10.1016/j.parco.2019.01.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Analytic queries are typically compiled into execution plans in the form of directed acyclic graphs (DAGs) of MapReduce jobs. Jobs in the DAGs are dispatched to the MapReduce processing engine as soon as their dependencies are satisfied. MapReduce adopts a job-level scheduling policy to strive for a balanced distribution of tasks and effective utilization of resources. However, such simplistic policy is unable to reconcile the dynamics of different jobs in complex analytic queries, resulting in unfair treatment of different queries, low utilization of system resources, prolonged execution time, and low query throughput. Therefore, we introduce a scheduling framework to address these problems systematically. Our framework includes two techniques: multivariate DAG modeling and two-level query scheduling. Cross-layer semantics percolation allows the flow of query semantics and job dependencies in the DAG to the MapReduce scheduler. With richer semantics information, we build a multivariate model that can accurately predict the execution time of individual MapReduce jobs and gauge the changing size of analytics datasets through selectivity approximation. Furthermore, we introduce two-level query scheduling that can maximize the intra-query job-level concurrency, and at the same time speed up the query-level completion time based on the accurate prediction and queuing of queries. At the job level, we focus on detecting query semantics, predicting the query completion time through an online multivariate linear regression model, thereby increasing job-level parallelism and maximizing data sharing across jobs. At the task level, we focus on balanced data distribution, maximal slot utilization, and optimal data locality of task scheduling. Our experimental results on a set of complex query benchmarks demonstrate that our scheduling framework can significantly improve both fairness and throughput of Hive queries. It can improve query response time by up to 43.9% and 72.8% on average, compared to the Hadoop Fair Scheduling and the Hadoop Capacity Scheduling, respectively. In addition, our two-level scheduler can achieve a query fairness that is, on average, 59.8% better than that of the Hadoop Fair Scheduler. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:66 / 78
页数:13
相关论文
共 50 条
  • [1] Two-level modeling of quarantine
    Khain, Evgeniy
    PHYSICAL REVIEW E, 2020, 102 (02)
  • [2] Analytic dynamics of coupled two-level systems
    Unanyan, RG
    Stenholm, S
    PHYSICAL REVIEW A, 2002, 66 (03): : 321081 - 321086
  • [3] Distributed Opportunistic Scheduling With Two-Level Probing
    Thejaswi, Chandrashekhar P. S.
    Zhang, Junshan
    Pun, Man-On
    Poor, H. Vincent
    Zheng, Dong
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2010, 18 (05) : 1464 - 1477
  • [4] Provably efficient two-level adaptive scheduling
    He, Yuxiong
    Hsu, Wen-Jing
    Leiserson, Charles E.
    JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, 2007, 4376 : 1 - +
  • [5] Two-level pipeline scheduling of adiabatic logic
    Varga, Laszlo
    Hosszu, Gabor
    Kovacs, Ferenc
    2006 29TH INTERNATIONAL SPRING SEMINAR ON ELECTRONICS TECHNOLOGY, 2006, : 485 - +
  • [6] Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries
    Yue, Yuanwen
    Kontogianni, Theodora
    Schindler, Konrad
    Engelmann, Francis
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 845 - 854
  • [7] New classes of analytic solutions of the two-level problem
    Ishkhanyan, AM
    JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 2000, 33 (31): : 5539 - 5546
  • [8] TWO-LEVEL SCHEDULING FRAMEWORK WITH FRAME LEVEL SCHEDULING AND EXPONENTIAL RULE IN WIRELESS NETWORK
    Mae, Ang Ee
    Kwee, Wee Kuok
    Han, Pang Ying
    Hoe, Lau Siong
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA), 2014,
  • [9] Distributed Opportunistic Scheduling With Two-Level Channel Probing
    Thejaswi, Chandrashekhar P. S.
    Zhang, Junshan
    Pun, Man-On
    Poor, H. V.
    IEEE INFOCOM 2009 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-5, 2009, : 1683 - +
  • [10] A kind of Two-Level Cooperation Distributed Scheduling Strategy
    Ruan, Dongru
    Hua, Yu
    Pang, Zhifeng
    2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 410 - 413