Accelerating Big Data Analytics Using Scale-up/out Heterogeneous Clusters

被引：1

作者：

Li, Zhuozhao ^{[1
]}

Shen, Haiying ^{[2
]}

Ward, Lee ^{[3
]}

机构：

[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA

[2] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA

[3] Sandia Natl Labs, Livermore, CA 94550 USA

来源：

2019 28TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN) | 2019年

关键词：

D O I：

10.1109/icccn.2019.8847060

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Production data analytic workloads typically consist of a majority of jobs with small input data sizes and a small number of jobs with large input data sizes. Recent works advocate scale-up/scale-out heterogeneous clusters (in short Hybrid clusters) to handle these heterogeneous workloads, since scale-up machines (i.e., adding more resources to a single machine) can process small jobs faster than simply scaling out the cluster with cheap machines. However, there are several challenges for job placement and data placement to implement such a Hybrid cluster. In this paper, we propose a job placement strategy and a data placement strategy to solve the challenges. The job placement strategy places a job to either scale-up or scale-out machines based on the job's characteristics, and migrates jobs from scale-up machines to under-utilized scale-out machines to achieve load balance. The data placement strategy allocates data replicas in the two types of machines accordingly to increase the data locality in Hybrid cluster. We implemented a Hybrid cluster on Apache YARN, and evaluated its performance using a Facebook production workload. With our proposed strategies, a Hybrid cluster can reduce the makespan of the workload up to 37% and the median job completion time up to 60%, compared to traditional scale-out clusters with state-of-the-art schedulers.

引用

页数：9

共 50 条

[1] Using data analytics to accelerate biopharmaceutical process scale-up
Facco, Pierantonio
Zomer, Simeone
Rowland-Jones, Ruth C.
Marsh, Douglas
Diaz-Fernandez, Paloma
Finka, Gary
Bezzo, Fabrizio
Barolo, Massimiliano
[J]. BIOCHEMICAL ENGINEERING JOURNAL, 2020, 164
[2] Accelerating Big Data Analytics Using FPGAs
Neshatpour, Katayoun
Malik, Maria
Ghodrat, Mohammad Ali
Homayoun, Houman
[J]. 2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 164 - 164
[3] Accelerating big data analytics on HPC clusters using two-level storage
Xuan, Pengfei
Ligon, Walter B.
Srimani, Pradip K.
Ge, Rong
Luo, Feng
[J]. PARALLEL COMPUTING, 2017, 61 : 18 - 34
[4] Big Data: Scale Down, Scale Up, Scale Out
Gibbons, Phillip B.
[J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 3 - 3
[5] Scale-Up of Heterogeneous Catalysts
Sussman, Victor J.
Calverley, Edward M.
Olken, Michael D.
Anaya, Denise A.
Hickman, Daniel A.
[J]. CHEMICAL ENGINEERING PROGRESS, 2023, 119 (02) : 17 - 24
[6] SCALE-UP - HOW BIG IS BIG ENOUGH
VANBRUNT, J
[J]. BIO-TECHNOLOGY, 1988, 6 (05): : 479 - &
[7] An Analytical Framework for Estimating Scale-Out and Scale-Up Power Efficiency of Heterogeneous Manycores
Ma, Jun
Yan, Guihai
Han, Yinhe
Li, Xiaowei
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (02) : 367 - 381
[8] Accelerating Genetic Sensor Development, Scale-up, and Deployment Using Synthetic Biology
Joshi, Shivang Hina-Nilesh
Jenkins, Christopher
Ulaeto, David
Gorochowski, Thomas E.
[J]. BIODESIGN RESEARCH, 2024, 6
[9] Big Data Analytics on Heterogeneous Accelerator Architectures
Neshatpour, Katayoun
Sasan, Avesta
Homayoun, Houman
[J]. 2016 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2016,
[10] OVERCOMING SCALE-UP AND SCALE-OUT WITH AUTOMATION
Bure, K.
[J]. CYTOTHERAPY, 2013, 15 (04) : S18 - S18

← 1 2 3 4 5 →