Data Analytics in the Cloud with Flexible MapReduce Workflows

被引:0
|
作者
Goncalves, Carlos [1 ,2 ]
Assuncao, Luis [1 ,2 ]
Cunha, Jose C. [2 ]
机构
[1] Univ Nova Lisboa, Inst Super Engn Lisboa, P-1200 Lisbon, Portugal
[2] Univ Nova Lisboa, Fac Ciencias Tecnol, Dept Informat, CITI, P-1200 Lisbon, Portugal
关键词
MapReduce; Workflow; Text Mining; Cloud; MAP-REDUCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud It would he desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition. of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the A WARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the A WARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
引用
下载
收藏
页数:8
相关论文
共 50 条
  • [1] Flexible MapReduce Workflows for Cloud Data Analytics
    Goncalves, Carlos
    Assuncao, Luis
    Cunha, Jose C.
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2013, 5 (04) : 48 - 64
  • [2] A Cloud Framework for Big Data Analytics Workflows on Azure
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    CLOUD COMPUTING AND BIG DATA, 2013, 23 : 182 - 191
  • [3] Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center
    Wang, Jia
    Li, Xiaoping
    Ruiz, Ruben
    Xu, Hanchuan
    Chu, Dianhui
    SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2020, 14 (02) : 101 - 118
  • [4] Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center
    Jia Wang
    Xiaoping Li
    Rubén Ruiz
    Hanchuan Xu
    Dianhui Chu
    Service Oriented Computing and Applications, 2020, 14 : 101 - 118
  • [5] Decentralized executions of privacy awareness data analytics workflows in the cloud
    Yao, Yan
    Cao, Jian
    Qian, Shiyou
    Feng, Shanshan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (15):
  • [6] Enabling Big Data Analytics in the Hybrid Cloud using Iterative MapReduce
    Clemente-Castello, Francisco J.
    Nicolae, Bogdan
    Katrinis, Kostas
    Rafique, M. Mustafa
    Mayo, Rafael
    Carlos Fernandez, Juan
    Loreti, Daniela
    2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 290 - 299
  • [7] A Scheduling Algorithm for Hadoop MapReduce Workflows with Budget Constraints in the Heterogeneous Cloud
    Wylie, Andrew
    Shi, Wei
    Corriveau, Jean-Pierre
    Wang, Yang
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1433 - 1442
  • [8] Optimal construction of virtual networks for Cloud-based MapReduce workflows
    Xu, Cong
    Yang, Jiahai
    Yin, Kevin
    Yu, Hui
    COMPUTER NETWORKS, 2017, 112 : 194 - 207
  • [9] Big Data Analytics based on PANFIS MapReduce
    Za'in, Choiru
    Pratama, Mahardhika
    Lughofer, Edwin
    Ferdaus, Meftahul
    Cai, Qing
    Prasad, Mukesh
    INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
  • [10] Cleaning MapReduce Workflows
    Interlandi, Matteo
    Lacroix, Julien
    Boucelma, Omar
    Guerra, Francesco
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 74 - 78