Performance Optimization of Hadoop Workflows in Public Clouds through Adaptive Task Partitioning

被引:0
|
作者
Shu, Tong [1 ]
Wu, Chase Q. [1 ,2 ]
机构
[1] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[2] Northwest Univ Xian, Sch Informat Sci & Technol, Xian 710127, Shaanxi, Peoples R China
基金
美国国家科学基金会;
关键词
SCHEDULING MALLEABLE TASKS; PRECEDENCE CONSTRAINTS; SCIENTIFIC WORKFLOWS; ALGORITHM;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing provides a cost-effective computing platform for big data workflows where moldable parallel computing models such as MapReduce are widely applied to meet stringent performance requirements. The granularity of task partitioning in each moldable job has a significant impact on workflow completion time and financial cost. We investigate the properties of moldable jobs and design a big-data workflow mapping model, based on which, we formulate a workflow mapping problem to minimize workflow makespan under a budget constraint in public clouds. We show this problem to be strongly NP-complete and design i) a fully polynomial-time approximation scheme (FPTAS) for a special case with a pipeline-structured workflow executed on virtual machines in a single class, and ii) a heuristic for a generalized problem with an arbitrary directed acyclic graph-structured workflow executed on virtual machines in multiple classes. The performance superiority of the proposed solution is illustrated by extensive simulation-based results in Hadoop/YARN in comparison with existing workflow mapping models and algorithms.
引用
收藏
页数:9
相关论文
共 24 条
  • [1] Modeling, Optimization and Performance Evaluation of Scientific Workflows in Clouds
    Figiela, Kamil
    Malawski, Maciej
    [J]. 2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 280 - 280
  • [2] Resource Provisioning for Task-Batch Based Workflows with Deadlines in Public Clouds
    Cai, Zhicheng
    Li, Xiaoping
    Ruiz, Ruben
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2019, 7 (03) : 814 - 826
  • [3] Performance and irregular behavior of adaptive task partitioning
    de Doncker, E
    Zanny, R
    Kaugars, K
    Cucos, L
    [J]. COMPUTATIONAL SCIENCE -- ICCS 2001, PROCEEDINGS PT 2, 2001, 2074 : 118 - 127
  • [4] Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows
    de Oliveira, Daniel
    Ocana, Kary A. C. S.
    Ogasawara, Eduardo
    Dias, Jonas
    Goncalves, Joao
    Baiao, Fernanda
    Mattoso, Marta
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (07): : 1816 - 1825
  • [5] Evolving Multi-objective Strategies for Task Allocation of Scientific Workflows on Public Clouds
    Szabo, Claudia
    Kroeger, Trent
    [J]. 2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [6] Long-Term Performance Evaluation of Hadoop Jobs in Public and Community Clouds
    Aida, Kento
    Abdul-Rahman, Omar
    Sakane, Eisaku
    Motoyama, Kazutaka
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (06): : 1176 - 1184
  • [7] Performance Optimization of Budget-Constrained MapReduce Workflows in Multi-Clouds
    Cao, Huiyan
    Wu, Chase Q.
    [J]. 2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 243 - 252
  • [8] Performance optimization of computing task scheduling based on the Hadoop big data platform
    Li, Yang
    Hei, Xinhong
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022,
  • [9] Storage-aware Task Scheduling for Performance Optimization of Big Data Workflows
    Ye, Qianwen
    Wu, Chase Q.
    Cao, Huiyan
    Rao, Nageswara S. V.
    Hou, Aiqin
    [J]. 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 1095 - 1102
  • [10] Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds
    Yang, Shin-Jer
    Chen, Yi-Ru
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2015, 57 : 61 - 70