Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows

被引:1
|
作者
Mohamed, Nabeel [1 ]
Maji, Nabanita [1 ]
Zhang, Jing [1 ]
Timoshevskaya, Nataliya [1 ]
Feng, Wu-Chun [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
关键词
PLATFORM; GALAXY; CLOUDMAN; TAVERNA; TOOL;
D O I
10.1109/TrustCom.2014.97
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Hadoop framework has gained significant attention from the scientific community due to its applicability to large-scale data analysis in many areas. This analysis often involves multiple stages of processing, which in turn, constitutes a workflow. While some stages of a workflow are mandatory, others are subject to the type of analysis to be done. In addition, a workflow may possess data dependencies between stages that must be enforced, and it may exhibit varying levels of sensitivity. The resources needed for such data analysis can range from a laptop to in-house clusters (or private cloud) to a public cloud. Managing such workflows, while using such a gamut of computing resources, is an unnecessarily arduous task for domain scientists. To address the above challenges, we present Aeromancer, a feature-rich workflow manager for running MapReduce-based workflows that utilizes both client and cloud resources. Aeromancer offers an ensemble of features, including the simultaneous use of client resources (e.g., on-premises clusters) and public cloud resources; automatic data-dependency and data-transfer handling; intra-flow, on-demand cluster provisioning; and support for directed-acyclic graphs (DAGs). To demonstrate its functionality, we apply Aeromancer to several bioinformatics pipelines, as part of a "big data" case study in the life sciences, which seeks to increase the adoption of hybrid computing environments, including the emerging "client+cloud" computing model, for running data-intensive workflows.
引用
收藏
页码:739 / 746
页数:8
相关论文
共 50 条
  • [1] MELT: Mapreduce-based Efficient Large-scale Trajectory Anonymization
    Ward, Katrina
    Lin, Dan
    Madria, Sanjay
    [J]. SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
  • [2] MapReduce-based Dragonfly Algorithm for large-scale Data-Clustering
    Tripathi, Ashish Kumar
    Saxena, Pranav
    Gupta, Siddharth
    [J]. 2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 171 - 175
  • [3] ARLS: A MapReduce-based output analysis tool for large-scale simulations
    Lee, Kangsun
    Jung, Kwanghoon
    Park, Joonho
    Kwon, Dongseop
    [J]. ADVANCES IN ENGINEERING SOFTWARE, 2016, 95 : 28 - 37
  • [4] A MapReduce-based artificial bee colony for large-scale data clustering
    Banharnsakun, Anan
    [J]. PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
  • [5] A MapReduce-based approach for shortest path problem in large-scale networks
    Aridhi, Sabeur
    Lacomme, Philippe
    Ren, Libo
    Vincent, Benjamin
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 41 : 151 - 165
  • [6] A Large-Scale Implementation Using MapReduce-Based SVM for Tweets Sentiment Analysis
    Lijo, V. P.
    Seetha, Hari
    [J]. INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 541 - 549
  • [7] Graph partitioning MapReduce-based algorithms for counting triangles in large-scale graphs
    Ahmed Sharafeldeen
    Mohammed Alrahmawy
    Samir Elmougy
    [J]. Scientific Reports, 13
  • [8] Graph partitioning MapReduce-based algorithms for counting triangles in large-scale graphs
    Sharafeldeen, Ahmed
    Alrahmawy, Mohammed
    Elmougy, Samir
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [9] A MapReduce-Based Approach for Fast Connected Components Detection from Large-Scale Networks
    Bhat, Sajid Yousuf
    Abulaish, Muhammad
    [J]. BIG DATA, 2024,
  • [10] Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-scale Heterogeneous Supercomputers
    Shirahata, Koichi
    Sato, Hitoshi
    Suzumura, Toyotaro
    Matsuoka, Satoshi
    [J]. PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 277 - 284