Accelerating Big Data Analytics Using Scale-up/out Heterogeneous Clusters

被引:1
|
作者
Li, Zhuozhao [1 ]
Shen, Haiying [2 ]
Ward, Lee [3 ]
机构
[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
[2] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA
[3] Sandia Natl Labs, Livermore, CA 94550 USA
关键词
D O I
10.1109/icccn.2019.8847060
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Production data analytic workloads typically consist of a majority of jobs with small input data sizes and a small number of jobs with large input data sizes. Recent works advocate scale-up/scale-out heterogeneous clusters (in short Hybrid clusters) to handle these heterogeneous workloads, since scale-up machines (i.e., adding more resources to a single machine) can process small jobs faster than simply scaling out the cluster with cheap machines. However, there are several challenges for job placement and data placement to implement such a Hybrid cluster. In this paper, we propose a job placement strategy and a data placement strategy to solve the challenges. The job placement strategy places a job to either scale-up or scale-out machines based on the job's characteristics, and migrates jobs from scale-up machines to under-utilized scale-out machines to achieve load balance. The data placement strategy allocates data replicas in the two types of machines accordingly to increase the data locality in Hybrid cluster. We implemented a Hybrid cluster on Apache YARN, and evaluated its performance using a Facebook production workload. With our proposed strategies, a Hybrid cluster can reduce the makespan of the workload up to 37% and the median job completion time up to 60%, compared to traditional scale-out clusters with state-of-the-art schedulers.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Scale-up of agglomeration processes using transformations
    Mort, Paul R.
    Tardos, Gabriel I.
    [J]. KONA Powder and Particle Journal, 1999, 17 (May): : 64 - 75
  • [42] SCALE-UP OF MEMBRANE SYSTEMS FROM LAB DATA
    GOODING, CH
    [J]. JOURNAL OF MEMBRANE SCIENCE, 1991, 62 (03) : 309 - 323
  • [43] Big Data Analytics Framework for Predictive Analytics using Public Data with Privacy Preserving
    Ho, Duy H.
    Lee, Yugyung
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5395 - 5405
  • [44] SOLID ADVICE Vital Data for Solids Scale-Up
    Blackwood, Tom
    [J]. Chemical Processing, 2023, 85 (02):
  • [45] Scale-up of two-phase flow in heterogeneous porous media
    Chang, YC
    Mohanty, KK
    [J]. JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 1997, 18 (1-2) : 21 - 34
  • [46] Data-Intensive Task Scheduling for Heterogeneous Big Data Analytics in IoT System
    Li, Xin
    Wang, Liangyuan
    Abawajy, Jemal H.
    Qin, Xiaolin
    Pau, Giovanni
    You, Ilsun
    [J]. ENERGIES, 2020, 13 (17)
  • [47] Predictive Big Data Analytics using the UK Biobank Data
    Zhou, Yiwang
    Zhao, Lu
    Zhou, Nina
    Zhao, Yi
    Marino, Simeone
    Wang, Tuo
    Sun, Hanbo
    Toga, Arthur W.
    Dinov, Ivo D.
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [48] Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating
    Xu, Wenyuan
    Fang, Weiwei
    Ding, Yi
    Zou, Meixia
    Xiong, Naixue
    [J]. IEEE ACCESS, 2021, 9 : 38457 - 38466
  • [49] Accelerating federated learning for IoT in big data analytics with pruning, quantization and selective updating
    Xu, Wenyuan
    Fang, Weiwei
    Ding, Yi
    Zou, Meixia
    Xiong, Naixue
    [J]. IEEE Access, 2021, 9 : 38457 - 38466
  • [50] Dynamic and Transparent Memory Sharing for Accelerating Big Data Analytics Workloads in Virtualized Cloud
    Cao, Wenqi
    Liu, Ling
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 191 - 200