Accelerating Big Data Analytics Using Scale-up/out Heterogeneous Clusters

被引：1

作者：

Li, Zhuozhao ^{[1
]}

Shen, Haiying ^{[2
]}

Ward, Lee ^{[3
]}

机构：

[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA

[2] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA

[3] Sandia Natl Labs, Livermore, CA 94550 USA

来源：

2019 28TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN) | 2019年

关键词：

D O I：

10.1109/icccn.2019.8847060

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Production data analytic workloads typically consist of a majority of jobs with small input data sizes and a small number of jobs with large input data sizes. Recent works advocate scale-up/scale-out heterogeneous clusters (in short Hybrid clusters) to handle these heterogeneous workloads, since scale-up machines (i.e., adding more resources to a single machine) can process small jobs faster than simply scaling out the cluster with cheap machines. However, there are several challenges for job placement and data placement to implement such a Hybrid cluster. In this paper, we propose a job placement strategy and a data placement strategy to solve the challenges. The job placement strategy places a job to either scale-up or scale-out machines based on the job's characteristics, and migrates jobs from scale-up machines to under-utilized scale-out machines to achieve load balance. The data placement strategy allocates data replicas in the two types of machines accordingly to increase the data locality in Hybrid cluster. We implemented a Hybrid cluster on Apache YARN, and evaluated its performance using a Facebook production workload. With our proposed strategies, a Hybrid cluster can reduce the makespan of the workload up to 37% and the median job completion time up to 60%, compared to traditional scale-out clusters with state-of-the-art schedulers.

引用

页数：9

共 50 条

[41] Scale-up of agglomeration processes using transformations
Mort, Paul R.
Tardos, Gabriel I.
[J]. KONA Powder and Particle Journal, 1999, 17 (May): : 64 - 75
[42] SCALE-UP OF MEMBRANE SYSTEMS FROM LAB DATA
GOODING, CH
[J]. JOURNAL OF MEMBRANE SCIENCE, 1991, 62 (03) : 309 - 323
[43] Big Data Analytics Framework for Predictive Analytics using Public Data with Privacy Preserving
Ho, Duy H.
Lee, Yugyung
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5395 - 5405
[44] SOLID ADVICE Vital Data for Solids Scale-Up
Blackwood, Tom
[J]. Chemical Processing, 2023, 85 (02):
[45] Scale-up of two-phase flow in heterogeneous porous media
Chang, YC
Mohanty, KK
[J]. JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 1997, 18 (1-2) : 21 - 34
[46] Data-Intensive Task Scheduling for Heterogeneous Big Data Analytics in IoT System
Li, Xin
Wang, Liangyuan
Abawajy, Jemal H.
Qin, Xiaolin
Pau, Giovanni
You, Ilsun
[J]. ENERGIES, 2020, 13 (17)
[47] Predictive Big Data Analytics using the UK Biobank Data
Zhou, Yiwang
Zhao, Lu
Zhou, Nina
Zhao, Yi
Marino, Simeone
Wang, Tuo
Sun, Hanbo
Toga, Arthur W.
Dinov, Ivo D.
[J]. SCIENTIFIC REPORTS, 2019, 9 (1)
[48] Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating
Xu, Wenyuan
Fang, Weiwei
Ding, Yi
Zou, Meixia
Xiong, Naixue
[J]. IEEE ACCESS, 2021, 9 : 38457 - 38466
[49] Accelerating federated learning for IoT in big data analytics with pruning, quantization and selective updating
Xu, Wenyuan
Fang, Weiwei
Ding, Yi
Zou, Meixia
Xiong, Naixue
[J]. IEEE Access, 2021, 9 : 38457 - 38466
[50] Dynamic and Transparent Memory Sharing for Accelerating Big Data Analytics Workloads in Virtualized Cloud
Cao, Wenqi
Liu, Ling
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 191 - 200

← 1 2 3 4 5 →