Accelerating Big Data Analytics Using Scale-up/out Heterogeneous Clusters

被引:1
|
作者
Li, Zhuozhao [1 ]
Shen, Haiying [2 ]
Ward, Lee [3 ]
机构
[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
[2] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA
[3] Sandia Natl Labs, Livermore, CA 94550 USA
关键词
D O I
10.1109/icccn.2019.8847060
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Production data analytic workloads typically consist of a majority of jobs with small input data sizes and a small number of jobs with large input data sizes. Recent works advocate scale-up/scale-out heterogeneous clusters (in short Hybrid clusters) to handle these heterogeneous workloads, since scale-up machines (i.e., adding more resources to a single machine) can process small jobs faster than simply scaling out the cluster with cheap machines. However, there are several challenges for job placement and data placement to implement such a Hybrid cluster. In this paper, we propose a job placement strategy and a data placement strategy to solve the challenges. The job placement strategy places a job to either scale-up or scale-out machines based on the job's characteristics, and migrates jobs from scale-up machines to under-utilized scale-out machines to achieve load balance. The data placement strategy allocates data replicas in the two types of machines accordingly to increase the data locality in Hybrid cluster. We implemented a Hybrid cluster on Apache YARN, and evaluated its performance using a Facebook production workload. With our proposed strategies, a Hybrid cluster can reduce the makespan of the workload up to 37% and the median job completion time up to 60%, compared to traditional scale-out clusters with state-of-the-art schedulers.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Using data analytics to accelerate biopharmaceutical process scale-up
    Facco, Pierantonio
    Zomer, Simeone
    Rowland-Jones, Ruth C.
    Marsh, Douglas
    Diaz-Fernandez, Paloma
    Finka, Gary
    Bezzo, Fabrizio
    Barolo, Massimiliano
    [J]. BIOCHEMICAL ENGINEERING JOURNAL, 2020, 164
  • [2] Accelerating Big Data Analytics Using FPGAs
    Neshatpour, Katayoun
    Malik, Maria
    Ghodrat, Mohammad Ali
    Homayoun, Houman
    [J]. 2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 164 - 164
  • [3] Accelerating big data analytics on HPC clusters using two-level storage
    Xuan, Pengfei
    Ligon, Walter B.
    Srimani, Pradip K.
    Ge, Rong
    Luo, Feng
    [J]. PARALLEL COMPUTING, 2017, 61 : 18 - 34
  • [4] Big Data: Scale Down, Scale Up, Scale Out
    Gibbons, Phillip B.
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 3 - 3
  • [5] Scale-Up of Heterogeneous Catalysts
    Sussman, Victor J.
    Calverley, Edward M.
    Olken, Michael D.
    Anaya, Denise A.
    Hickman, Daniel A.
    [J]. CHEMICAL ENGINEERING PROGRESS, 2023, 119 (02) : 17 - 24
  • [6] SCALE-UP - HOW BIG IS BIG ENOUGH
    VANBRUNT, J
    [J]. BIO-TECHNOLOGY, 1988, 6 (05): : 479 - &
  • [7] An Analytical Framework for Estimating Scale-Out and Scale-Up Power Efficiency of Heterogeneous Manycores
    Ma, Jun
    Yan, Guihai
    Han, Yinhe
    Li, Xiaowei
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (02) : 367 - 381
  • [8] Accelerating Genetic Sensor Development, Scale-up, and Deployment Using Synthetic Biology
    Joshi, Shivang Hina-Nilesh
    Jenkins, Christopher
    Ulaeto, David
    Gorochowski, Thomas E.
    [J]. BIODESIGN RESEARCH, 2024, 6
  • [9] Big Data Analytics on Heterogeneous Accelerator Architectures
    Neshatpour, Katayoun
    Sasan, Avesta
    Homayoun, Houman
    [J]. 2016 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2016,
  • [10] OVERCOMING SCALE-UP AND SCALE-OUT WITH AUTOMATION
    Bure, K.
    [J]. CYTOTHERAPY, 2013, 15 (04) : S18 - S18