Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows

被引:1
|
作者
Mohamed, Nabeel [1 ]
Maji, Nabanita [1 ]
Zhang, Jing [1 ]
Timoshevskaya, Nataliya [1 ]
Feng, Wu-Chun [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
关键词
PLATFORM; GALAXY; CLOUDMAN; TAVERNA; TOOL;
D O I
10.1109/TrustCom.2014.97
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Hadoop framework has gained significant attention from the scientific community due to its applicability to large-scale data analysis in many areas. This analysis often involves multiple stages of processing, which in turn, constitutes a workflow. While some stages of a workflow are mandatory, others are subject to the type of analysis to be done. In addition, a workflow may possess data dependencies between stages that must be enforced, and it may exhibit varying levels of sensitivity. The resources needed for such data analysis can range from a laptop to in-house clusters (or private cloud) to a public cloud. Managing such workflows, while using such a gamut of computing resources, is an unnecessarily arduous task for domain scientists. To address the above challenges, we present Aeromancer, a feature-rich workflow manager for running MapReduce-based workflows that utilizes both client and cloud resources. Aeromancer offers an ensemble of features, including the simultaneous use of client resources (e.g., on-premises clusters) and public cloud resources; automatic data-dependency and data-transfer handling; intra-flow, on-demand cluster provisioning; and support for directed-acyclic graphs (DAGs). To demonstrate its functionality, we apply Aeromancer to several bioinformatics pipelines, as part of a "big data" case study in the life sciences, which seeks to increase the adoption of hybrid computing environments, including the emerging "client+cloud" computing model, for running data-intensive workflows.
引用
收藏
页码:739 / 746
页数:8
相关论文
共 50 条
  • [31] Interactive Rendering for Large-Scale Mesh Based on MapReduce
    Zhang, Hongxin
    Zhu, Biao
    Chen, Wei
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS (CAD/GRAPHICS), 2013, : 345 - 352
  • [32] Latency Modeling and Minimization for Large-scale Scientific Workflows in Distributed Network Environments
    Wu, Qishi
    Gu, Yi
    Liao, Yuchen
    Lu, Xukang
    Lin, Yunyue
    Rao, Nageswara S. V.
    [J]. 44TH ANNUAL SIMULATION SYMPOSIUM 2011 (ANSS 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 2 OF 8, 2011, : 205 - 212
  • [33] Large-scale incremental processing with MapReduce
    Lee, Daewoo
    Kim, Jin-Soo
    Maeng, Seungryoul
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 36 : 66 - 79
  • [34] A MapReduce-based Approach for Finding Inexact Patterns in Large Graphs
    Feher, Peter
    Asztalos, Mark
    Meszaros, Tamas
    Lengyel, Laszlo
    [J]. MODELSWARD 2015 PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING AND SOFTWARE DEVELOPMENT, 2015, : 205 - 212
  • [35] Large-Scale Graph Classification Based on Evolutionary Computation with MapReduce
    Wang, Zhanghui
    Zhao, Yuhai
    Wang, Guoren
    Cheng, Yurong
    [J]. WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 227 - 243
  • [36] Biomarker Discovery Based on Large-Scale Feature Selection and MapReduce
    Kourid, Ahlam
    Batouche, Mohamed
    [J]. COMPUTER SCIENCE AND ITS APPLICATIONS, CIIA 2015, 2015, 456 : 81 - 92
  • [37] PARM: Physics aware runtime manager for large-scale scientific and engineering applications
    Zhang, Yeliang
    Hariri, Salim
    Xiang, Jianwei
    Yeh, Jim
    [J]. HPDC-15: PROCEEDINGS OF THE 15TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2005, : 363 - 364
  • [38] Scheduling large-scale scientific workflow on virtual machines with different numbers of vCPUs
    Wu, Hao
    Chen, Xin
    Song, Xiaoyu
    Zhang, Chi
    Guo, He
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (01): : 679 - 710
  • [39] Scheduling large-scale scientific workflow on virtual machines with different numbers of vCPUs
    Hao Wu
    Xin Chen
    Xiaoyu Song
    Chi Zhang
    He Guo
    [J]. The Journal of Supercomputing, 2021, 77 : 679 - 710
  • [40] On an Integrated Mapping and Scheduling Solution to Large-scale Scientific Workflows in Resource Sharing Environments
    Yun, Daqing
    Wu, Qishi
    Gu, Yi
    Liu, Xiyang
    [J]. 46TH ANNUAL SIMULATION SYMPOSIUM (ANSS 2013) - 2013 SPRING SIMULATION MULTICONFERENCE (SPRINGSIM'13), 2013, 45 (02): : 49 - 56